0% found this document useful (0 votes)
44 views

Lecture03 Discrete Random Variables Ver1

This document provides an overview of the topics that will be covered in Lecture 03 on discrete random variables. The plan outlines that the lecture will cover random variables, probability distributions for discrete random variables, properties of discrete random variables, the binomial distribution, the Poisson distribution, the hypergeometric distribution, and jointly distributed discrete random variables.

Uploaded by

Hongjiang Zhang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Lecture03 Discrete Random Variables Ver1

This document provides an overview of the topics that will be covered in Lecture 03 on discrete random variables. The plan outlines that the lecture will cover random variables, probability distributions for discrete random variables, properties of discrete random variables, the binomial distribution, the Poisson distribution, the hypergeometric distribution, and jointly distributed discrete random variables.

Uploaded by

Hongjiang Zhang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Lecture 03.

Discrete Random Variables


(Chapter 4)

Ping Yu

HKU Business School


The University of Hong Kong

Ping Yu (HKU) Discrete Random Variables 1 / 37


Plan of This Lecture

Random Variables
Probability Distributions for Discrete Random Variables
Properties of Discrete Random Variables
Binomial Distribution
Poisson Distribution
Hypergeometric Distribution
Jointly Distributed Discrete Random Variables

Ping Yu (HKU) Discrete Random Variables 2 / 37


Random Variables

Random Variables

Ping Yu (HKU) Discrete Random Variables 3 / 37


Random Variables

Discrete Random Variables

A random variable (r.v.) is a variable that takes on numerical values realized by the
outcomes in the sample space generated by a random experiment.
- Mathematically, a random variable is a function from S to R.
- In this and next lectures, we use capital letters, such as X , to denote the random
variable, and the corresponding lowercase letter, x, to denote a possible value.
A discrete random variable is a random variable that can take on no more than a
countable number of values.
- e.g., the number of customers visiting a store in one day, the number of claims on
a medical insurance policy in a particular year, etc.
- "Countable" includes "finite" and "countably infinite".

Ping Yu (HKU) Discrete Random Variables 4 / 37


Random Variables

Continuous Random Variables

A continuous random variable is a random variable that can take any value in an
interval (i.e., for any two values, there is some third value that lies between them).
- e.g., the yearly income for a family, the highest temperature in one day, etc.
- The probability can only be assigned to a range of values since the probability of
a single value is always zero.
Recall the distinction between discrete numerical variables and continuous
numerical variables in Lecture 1.
Modeling a r.v. as continuous is usually for convenience as the differences
between adjacent discrete values (e.g., $35,276.21 and $35,276.22) are of no
importance.
On the other hand, we model a r.v. as discrete when probability statements about
the individual possible outcomes have worthwhile meaning.

Ping Yu (HKU) Discrete Random Variables 5 / 37


Probability Distributions for Discrete Random Variables

Probability Distributions for Discrete Random


Variables

Ping Yu (HKU) Discrete Random Variables 6 / 37


Probability Distributions for Discrete Random Variables

Probability Distribution Function

The probability distribution (function), p (x ), of a discrete r.v. X represents the


probability that X takes the value x, as a function of x, i.e.,

p (x ) = P (X = x ) for all values of x.

- Sometimes, the probability distribution of a discrete r.v. is called the probability


mass function (pmf).
- Note that X = x must be an event; otherwise, P (X = x ) is not well defined.
p (x ) must satisfy the following properties (implied by the probability postulates in
Lecture 2):

1 0 p (x ) 1 for any value x,


2 ∑x2S p (x ) = 1, where S is called the support of X , i.e., the set of all x values
such that p (x ) > 0.

Notation: I will use p (x ) and p (rather than P (x ) and P as in the textbook) for pmf
and an interested probability to avoid confusion with the probability symbol P.

Ping Yu (HKU) Discrete Random Variables 7 / 37


Probability Distributions for Discrete Random Variables

Example 4.1: Number of Product Sales

Ping Yu (HKU) Discrete Random Variables 8 / 37


Probability Distributions for Discrete Random Variables

Cumulative Distribution Function

The cumulative distribution function (cdf), F (x0 ), of a r.v. X , represents the


probability that X does not exceed the value x0 , as a function of x0 , i.e.,

F (x0 ) = P (X x0 ) .

- The definition of cdf applies to both discrete and continuous r.v.’s, and x0 2 R.
- F (x0 ) for a discrete r.v. is a step function with jumps only at support points in S .
[figure here]
- p ( ) and F ( ) are probabilstic counterparts of histogram and ogive in Lecture 1.
Relationship between pmf and cdf for discrete r.v.’s:

F (x0 ) = ∑ p (x ) .
x x0

From the definition of cdf, we have (i) 0 F (x0 ) 1 for any x0 ; (ii) if x0 < x1 ,
F (x0 ) F (x1 ), i.e., F ( ) is an (weakly) increasing function.
From the figure in the next slide, we can also see (iii) F (x0 ) is right continuous,
i.e., limx#x0 F (x ) = F (x0 ); (iv) limx0 ! ∞ F (x0 ) = 0 and limx0 !∞ F (x0 ) = 1.

Ping Yu (HKU) Discrete Random Variables 9 / 37


Probability Distributions for Discrete Random Variables

Example Continued

0.7

0.3

0.1

0
-1 0 1 2 3 4

Ping Yu (HKU) Discrete Random Variables 10 / 37


Properties of Discrete Random Variables

Properties of Discrete Random Variables

Ping Yu (HKU) Discrete Random Variables 11 / 37


Properties of Discrete Random Variables

Mean

The pmf contains all information about the probability properties of a discrete r.v.,
but it is desirable to have some summary measures of the pmf’s characteristics.
The mean (or expected value, or expectation), E [X ], of a discrete r.v. X is defined
as
E [X ] = µ = ∑ xp (x ) .
x2S

∑N
i = 1 xi
- The mean of X is the same as the population mean in Lecture 1, µ = N , but
we use the probability language here: think of E [X ] in terms of relative
frequencies,
∑N
i =1 xi Nx
= ∑ x ,
N x2S
N
weighting each possible value x by its probability.
- In other words, the mean of X is is a weighted average of all possible values of X .
- For example, if we roll a die once, the expected outcome is
6
1
E [X ] = ∑i 6
= 3.5.
i =1

Ping Yu (HKU) Discrete Random Variables 12 / 37


Properties of Discrete Random Variables

Variance

The variance, Var (X ), of a discrete r.v. X is defined as


h i
Var (X ) = σ 2 = E (X µ )2 = ∑ (x µ )2 p (x ) .
x2S

- This definition of Var (X ) is the same as the population variance in Lecture 1.


- It is not hard to see that

σ2 = ∑ (x µ )2 p (x ) = ∑ x 2 p (x ) 2µ ∑ xp (x ) + µ 2 ∑ p (x )
x2S x2S x2S x2S
h i h i
= E X2 2µE [X ] + µ 2 = E X 2 2µ 2 + µ 2
h i
= E X2 µ 2,

i.e., the second moment first moment2 ,1 where in the third equality, p (x ) is the
probability of X 2 = x 2 ,2 and ∑x2S p (x ) = 1.
p
The standard deviation, σ = σ 2 , is the same as the population standard
deviation in Lecture 1.
1
σ 2 is also called the second central moment.
2
What will happen if X can take both 1 and 1? ∑x 2 =1 x 2 P X 2 = x 2 = ∑x 2 = 1 x 2 (p (1) + p ( 1))
= 12 p (1) + ( 1)2 p ( 1) .
Ping Yu (HKU) Discrete Random Variables 13 / 37
Properties of Discrete Random Variables

Mean of Functions of a R.V.

For a function of X , g (X ), its mean, E [g (X )], is defined as

E [g (X )] = ∑ g (x ) p (x ) .
x2S

- e.g., X is the time to complete a contract, and g (X ) is the cost when the
completion time is X ; we want to know the expected cost.
E [g (X )] 6= g (E [X ]) in general, e.g., if g (X ) = X 2 , then
h i
E [g (X )] g (E [X ]) = E X 2 µ 2 = σ 2 > 0.

- However, when g (X ) is linear in X , E [g (X )] = g (E [X ]).

Ping Yu (HKU) Discrete Random Variables 14 / 37


Properties of Discrete Random Variables

Mean and Variance of Linear Functions

For Y = a + bX with a and b being constant fixed numbers,


µ Y := E [Y ] = E [a + bX ] = a + bE [X ] =: a + bµ X ,
σ 2Y := Var (Y ) = Var (a + bX ) = b2 Var (X ) =: b2 σ 2X ,
and q
σY = Var (Y ) = jbj σ X .
- The proof can follow similar steps as in the last last slide. [Exercise]
- The constant a will not contribute to the variance of Y .
Some Special Linear Functions:
- If b = 0, i.e., Y = a, then E [a] = a and Var (a) = 0.
- If a = 0, i.e., Y = bX , then E [bX ] = bE [X ] and Var (bX ) = b2 Var (X ).
X µ
- If a = µ X /σ X and b = 1/σ X , i.e., Y = σ X X is the z-score of X , then

X µX µX µX
E = =0
σX σX σX
and
X µX Var (X )
Var = = 1.
σX σ 2X
Ping Yu (HKU) Discrete Random Variables 15 / 37
Binomial Distribution

Binomial Distribution

Ping Yu (HKU) Discrete Random Variables 16 / 37


Binomial Distribution

Bernoulli Distribution

The Bernoulli r.v. is a r.v. taking only two values, 0 and 1, labeled as "failure" and
"success". [figure here]
If the probability of success, p (1) = p, then the probability of failure,
p (0) = 1 p (1) = 1 p. This distribution is known as the Bernoulli distribution,
and we denote a r.v. X with this distribution as X Bernoulli(p ).
The mean of a Bernoulli(p ) r.v. X is

µ X = E [X ] = 1 p+0 (1 p ) = p,

and the variance is

σ 2X = Var (X ) = (1 p )2 p + (0 p )2 (1 p)
= p (1 p) .

- When p = 0.5, σ 2X achieves its maximum; when p = 0 and 1, σ 2X = 0. [why?]

Ping Yu (HKU) Discrete Random Variables 17 / 37


Binomial Distribution

Jacob Bernoulli (1655-1705), Swiss

Jacob Bernoulli (1655-1705) was one of the many prominent mathematicians in


the Bernoulli family.

Ping Yu (HKU) Discrete Random Variables 18 / 37


Binomial Distribution

Binomial Distribution

The binomial r.v. X is the number of successes in n independent trials of a


Bernoulli(p ) r.v., denoted as X Binomial(n, p ).
Denote Xi as the outcome in the ith trial, then the binomial r.v. X = ∑ni=1 Xi .
After some thinking, we can figure out that the number of sequences with x
successes in n trials is Cxn , and the probability of any sequence with x successes
is px (1 p )n x by the multiplication rule.
By the addition rule, the binomial distribution is
p (xjn, p ) = Cxn px (1 p )n x
, x = 0, 1, , n.
From the discussion on multivariate r.v.’s below, we can show
( ) n
µ X = E [X ] = ∑ E [Xi ] = np,
i =1

and
( ) n
σ 2X = Var (X ) = ∑ Var (Xi ) = np (1 p) .
i =1
- (*) holds even if Xi ’s are dependent, while (**) depends on the independence of
Xi ’s; see the slides on jointly distributed r.v.’s.
Ping Yu (HKU) Discrete Random Variables 19 / 37
Binomial Distribution

0.3
Binomial(20,0.1)
Binomial(20,0.5)
0.25 Binomial(20,0.7)
Binomial(40,0.1)
Binomial(40,0.5)
0.2 Binomial(40,0.7)

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40

Figure: Binomial Distributions with Different n and p

Ping Yu (HKU) Discrete Random Variables 20 / 37


Poisson Distribution

Poisson Distribution

Ping Yu (HKU) Discrete Random Variables 21 / 37


Poisson Distribution

Poisson Distribution

The Poisson distribution was proposed by Siméon Poisson in 1837. [figure here]
Assume that an interval is divided into a very large number of equal subintervals
so that the probability of the occurrence of an event in any subinterval is very small
(e.g., 0.05). The Poisson distribution models the number of events occuring on
that inverval, assuming
1 The probability of the occurrence of an event is constant for all subintervals.
2 There can be no more than one occurrence in each subinterval.
3 Occurrences are independent.

From these assumptions, we can see the Poisson distribution can be used to
model, e.g., the number of failures in a large computer system during a given day,
the number of ships arriving at a dock during a 6-hour loading period, the number
of defective products in large production runs, etc.
The Poisson distribution is particularly useful in waiting line, or queuing, problems,
e.g., the probability of various numbers of customers waiting for a phone line or
waiting to check out of large retail store.
- For a store manager, how to balance long lines (too few checkout lines, losing
customers) and idle customer service associates (too many lines, resulting
waste)?
Ping Yu (HKU) Discrete Random Variables 22 / 37
Poisson Distribution

Siméon D. Poisson (1781-1840), French

Ping Yu (HKU) Discrete Random Variables 23 / 37


Poisson Distribution

continue

Intuitively, the Poisson r.v. is the binomial r.v. taking limit as p ! 0 and n ! ∞. If
np ! λ which specifies the average number of occurrences (successes) for a
particular time (and/or space), then the binomial distribution converges to the
Poisson distribution:
e λλx
p (xjλ ) = , x = 0, 1, 2, ,
x!
where e = 2.71828 is the base for natural logarithms, called Euler’s number.
[proof not required]
- We denote a r.v. X with the above Poisson distribution as X Poisson(λ ).
- When n is large and np is of only moderate size (preferably np 7), the binomial
distribution can be approximated by Poisson(np ). [figure here]
µ X = E [X ] = λ , and σ 2X = Var (X ) = λ .
- np ! λ , and np (1 p ) = np np p ! λ λ 0 = λ.
The sum of independent Poisson r.v.’s is also a Poisson r.v., e.g., the sum of K
Poisson(λ ) r.v. is a Poisson(K λ ) r.v..

Ping Yu (HKU) Discrete Random Variables 24 / 37


Poisson Distribution

0.3
Binomial(100,2%)
Poisson(2)
0.25

0.2

0.15

0.1

0.05

0
0 5 10 15

Figure: Poisson Approximation

For an example whether the approximation is not this good, see Assignment
II.8(iii).

Ping Yu (HKU) Discrete Random Variables 25 / 37


Hypergeometric Distribution

Hypergeometric Distribution

Ping Yu (HKU) Discrete Random Variables 26 / 37


Hypergeometric Distribution

Hypergeometric Distribution

If the binomial distribution can be treated as from random sampling with


replacement from a population of size N, S of which are successes and S/N = p,
then the hypergeometric distribution models the number of successes from
random sampling without replacement.
- These two random sampling schemes will be discussed more in Lecture 5.
The hypergeometric distribution is

CxS CnN x
S
p (xjn, N, S ) = , x = max (0, n (N S )) , , min (n, S ) ,
CnN
where n is the size of the random sample, and x is number of successes.
- A r.v. with this distribution is denoted as X Hypergeometric(n, N, S ).
The binomial distribution assumes the items are drawn independently, with the
probability of selecting an item being constant.
This assumption can be met in practice if a small sample is drawn (without
replacement) from a large population (e.g., N > 10, 000 and n/N < 1%). [figure
here]
When we draw from a small population, the probability of selecting an item is
changing with each selection because the number of remaining items is changing.

Ping Yu (HKU) Discrete Random Variables 27 / 37


Hypergeometric Distribution

0.25
Binomial(20,0.2)
Hypergeometric(20,100,20)
0.2 Hypergeometric(20,1000,200)

0.15

0.1

0.05

0
0 5 10 15

Figure: Comparison of Binomial and Hypergeometric Distributions

µ X = E [X ] = np, and σ 2X = Var (X ) = np (1 p) N


N
n
1 np (1 p ),3 where p = S
N.
[proof not required]

3 n N n
When N is small, N 1 is close to 1, matching the variance of the binomial r.v.
Ping Yu (HKU) Discrete Random Variables 28 / 37
Jointly Distributed Discrete Random Variables

Jointly Distributed Discrete Random Variables

Ping Yu (HKU) Discrete Random Variables 29 / 37


Jointly Distributed Discrete Random Variables

Bivariate Discrete R.V.’s: Joint and Marginal Probability Distributions

We can use bivariate probability distribution to model the relationship between two
univariate r.v.’s.
For two discrete r.v.’s X and Y , their joint probability distribution expresses the
probability that simultaneously X takes the specific value x and Y takes the value
y , as a function of x and y :

p (x, y ) = P (X = x \ Y = y ) , x 2 SX and y 2 SY .

- p (x, y ) is a straightforward extension of joint probabilities in Lecture 2, where


X = x and Y = y are two events with x and y indexing them.
- From probability postulates in Lecture 2, 0 p (x, y ) 1, and
∑x2SX ∑y 2SY p (x, y ) = 1.
The marginal probability distribution of X is

p (x ) = ∑ p (x, y ) ,
y 2SY

and the marginal probability distribution of Y is

p (y ) = ∑ p (x, y ) ,
x2SX

Ping Yu (HKU) Discrete Random Variables 30 / 37


Jointly Distributed Discrete Random Variables

Conditional Probability Distribution and Independence of Bivariate R.V.’s

These two concepts are parallel to conditional probabilities and independent


events in Lecture 2.
The conditional probability distribution of Y , given that X takes the value x,
expresses the probability that Y takes the value y , as a function of y , when the
value x is fixed for X :
p (x, y )
p (y jx ) = ;
p (x )
similarly, the conditional probability distribution of X , given Y = y , is
p (x, y )
p (xjy ) = .
p (y )
- One way of thinking of conditioning is filtering a data set based on the value of X .
Two r.v.’s X and Y are independent iff
p (x, y ) = p (x ) p (y )
for all x 2 SX and y 2 SY , i.e., independence of r.v.’s can be understood as a set
of independencies of events. E.g., "height" and "musical talent" are independent.
- Generally, k r.v.’s are independent if p (x1 , , xk ) = p (x1 ) p (x2 ) p (xk ).
- X and Y are independent iff p (y jx ) = p (y ) or p (xjy ) = p (x ) (=)symmetric).
Ping Yu (HKU) Discrete Random Variables 31 / 37
Jointly Distributed Discrete Random Variables

Conditional Mean and Variance

The conditional mean of Y , given that X takes the value x, is given by

µ Y jX =x = E [Y jX = x ] = ∑ yp (y jx ) .
y 2SY

- For any constants a and b, E [ a + bY j X = x ] = a + bE [Y jX = x ].


The conditional variance of Y , given that X takes the value x, is given by
2
σ 2Y jX =x = Var (Y jX = x ) = ∑ y µ Y jX =x p (y jx ) .
y 2SY

- For any constants a and b, Var (a + bY jX = x ) = b2 Var (Y jX = x ).


Notation: The notations used in the textbook, µ Y jX and σ 2Y jX , are not clear.

Ping Yu (HKU) Discrete Random Variables 32 / 37


Jointly Distributed Discrete Random Variables

Mean and Variance of (Linear) Functions

For a function of (X , Y ), g (X , Y ), its mean, E [g (X , Y )], is defined as

E [g (X , Y )] = ∑ ∑ g (x, y ) p (x, y ) .
x2SX y 2SY

For a linear function of (X , Y ), W = aX + bY ,

µ W := E [W ] = aµ X + bµ Y , [verified in the next slide]


σ 2W := Var (W ) = a2 σ 2X + b2 σ 2Y + 2abσ XY [see the next3 slide for σ XY ].

- e.g., W is the total revenue of two products with (X , Y ) being the sales and (a, b )
the prices.
- If a = b = 1, then E [X + Y ] = E [X ] + E [Y ], i.e., the mean of sum is the sum of
means.
- If a = 1 and b = 1, then E [X Y ] = E [X ] E [Y ], i.e., the mean of difference is
the difference of means.
- If a = b = 1 and σ XY = 0, then Var (X + Y ) = Var (X ) + Var (Y ), i.e., the
variance of sum is the sum of variances.
- If a = 1, b = 1 and σ XY = 0, then Var (X Y ) = Var (X ) + Var (Y ), i.e., the
variance of difference is the sum of variances.

Ping Yu (HKU) Discrete Random Variables 33 / 37


Jointly Distributed Discrete Random Variables

(*) Verification and Extensions

µW :

µW = ∑ ∑ (ax + by ) p (x, y )
x2SX y 2SY

= a ∑ x ∑ p (x, y ) + b ∑ y ∑ p (x, y )
x2SX y 2SY y 2SY x2SX

= a ∑ xp (x ) + b ∑ yp (y )
x2SX y 2SY
= aµ X + bµ Y ,

σ 2W can be derived based on this result. [Exercise]


Extension I: If W = ∑K
i =1 ai Xi , then

K K
µ W = E [W ] = ∑ ai E [Xi ] =: ∑ ai µ i ,
i =1 i =1

Ping Yu (HKU) Discrete Random Variables 34 / 37


Jointly Distributed Discrete Random Variables

continue

and

σ 2W = Var (W ) = ∑i =1 ai2 Var (Xi ) + 2 ∑i =1 ∑j>i ai aj Cov


K K 1 K
Xi , Xj

=: ∑ ∑ ∑
K K 1 K
a2 σ 2 + 2
i =1 i i i =1
aaσ .
j>i i j ij

- If ai = 1 for all i, then we have


h i
E ∑i =1 Xi = ∑i =1 µ i and Var ∑i =1 Xi = ∑i =1 σ 2i + 2 ∑i =1 ∑j>i σ ij ,
K K K K K 1 K

where Var ∑K K
i =1 Xi reduces to ∑i =1 σ i if σ ij = 0 for all i 6= j.
2

Extension II: For W = aX + bY and a r.v. Z different from (X , Y ),

E [W jZ = z ] = aµ X jZ =z + bµ Y jZ =z ,
Var (W jZ = z ) = a2 σ 2X jZ =z + b2 σ 2Y jZ =z + 2abσ XY jZ =z ,

where Var (W jZ = z ) reduces to a2 σ 2X jZ =z + b2 σ 2Y jZ =z if σ XY jZ =z = 0.

Ping Yu (HKU) Discrete Random Variables 35 / 37


Jointly Distributed Discrete Random Variables

Covariance and Correlation

These two concepts are the same as those in Lecture 1 but in the probability
language.
The covariance between X and Y

Cov (X , Y ) = σ XY = E [(X µ X ) (Y µ Y )] = ∑ ∑ (x µ X ) (y µ Y ) p (x, y ) .


x2SX y 2SY

h to ishow that Cov (X , Y ) = E [XY ]


- It is not hard µ X µ Y , which reduces to
Var (X ) = E X 2 µ 2X when X = Y .
The correlation between X and Y
Cov (X , Y )
Corr (X , Y ) = ρ XY = .
σX σY
Recall that σ XY is not unit-free so is unbounded, while ρ XY 2 [ 1, 1] is more
useful.
Recall that σ XY and ρ XY have the same sign: if they are positive, X and Y are
called positively dependent, when they are negative, X and Y are called negatively
dependent, when they are zero, there is no linear relationship between X and Y .

Ping Yu (HKU) Discrete Random Variables 36 / 37


Jointly Distributed Discrete Random Variables

Covariance and Independence

If X and Y are independent, then Cov (X , Y ) = Corr (X , Y ) = 0. [Exercise]


- The converse is not true; recall the figure in Lecture 1.
Here is a concrete example: if the distribution of X is

p ( 1) = 1/4, p (0) = 1/2 and p (1) = 1/4,

then Cov (X , Y ) = 0 with Y = X 2 . Why?


Because X can determine Y , X and Y are not independent.
The distribution of X implies E [X ] = 0.
The distribution of Y is p (0) = p (1) = 1/2, i.e., Y is a Bernoulli r.v., which implies
E [Y ] = 1/2.
The joint distribution of (X , Y ) is

p ( 1, 1) = 1/4, p (0, 0) = 1/2, p (1, 1) = 1/4,

which implies E [XY ] = 0, so Cov (X , Y ) = E [XY ] E [X ] E [Y ] = 0.


Portfolio analysis in the textbook (Pages 190-192, 236-240) will be discussed in
the next tutorial class.

Ping Yu (HKU) Discrete Random Variables 37 / 37

You might also like