0% found this document useful (0 votes)
52 views54 pages

Discrete Random Variable

The document provides an overview of discrete random variables, including definitions, examples, and important concepts such as probability mass functions (PMF) and independence. It explains various distributions, including Bernoulli, geometric, binomial, and negative binomial distributions, detailing their PMFs and applications. Additionally, it discusses the relationship between binomial random variables and independent Bernoulli trials.

Uploaded by

cactusvernal123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views54 pages

Discrete Random Variable

The document provides an overview of discrete random variables, including definitions, examples, and important concepts such as probability mass functions (PMF) and independence. It explains various distributions, including Bernoulli, geometric, binomial, and negative binomial distributions, detailing their PMFs and applications. Additionally, it discusses the relationship between binomial random variables and independent Bernoulli trials.

Uploaded by

cactusvernal123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DSC511: Statistical Foundations for Data

Science
Discrete Random Variable

Bakkyaraj T
Indian Institute of Information Technology Kottayam
Email:[email protected]
Randaom Variable

I To analyze random experiments, we focus on numerical


aspects of the experiment.
I For example, if an entire soccer game is a random experiment,
then numerical results like goals, shots, fouls, etc are random
variables.
I Random Variable is a real valued variable whose value is
determined by an underlying random experiment.
Random Variable- A Simple Example

I I toss a coin five times. This is a random experiment.


I The sample space is

S = {TTTTT , TTTTH, ..., HHHHH}.

I Say we are interested in the number of heads.


I We define random variable X whose value is the number of
observed heads.
I X can take values 0,1,2,3,4 or 5 depending on the outcomes
of the experiment.
I For example, X = 0 for the outcome TTTTT and X = 2 for
THTHT .
I Thus we see X is a function from the space S to real numbers.
Important Definitions

Random Variables: A random variable X is a function from


the sample space to the real numbers.

X :S →R

The range of a random variable X , shown by Range(X ) or


RX , is the set of possible values of X .
Discrete Random Variables

I Two important classes of random variables- discrete and


continuous.
I A third class mixed random variables can be thought of as a
mixture of discrete and continuous random variables.
I We define a discrete random variable as the following
X is a discrete random variable if its range is countable.

I A set A is countable if either


I A is a finite set such as {1, 2, 3, 4}, or
I it can be put in one-to-one correspondence with natural
numbers (in this case, the set is said to be countably infinite).
Probability Mass Function (PMF)

I A event A = {X = xk } is defined as the set of outcomes s in


the sample space S for which the value of X is xk .
A = {s ∈ S|X (s) = xk }.
I The probabilities of events {X = xk } is given by the
probability mass function (PMF) of X.
Let X be a discrete random variable with range RX =
{x1 , x2 , x3 , ...} (finite or countably infinite). The function

PX (xk ) = P(X = xk ), for k = 1, 2, 3, ...,

is called the probability mass function (PMF) of X .


Probability Mass Function (PMF)

I The PMF is a probability measure that gives us probabilities


of the possible values of a random variable.
I We use PX as the standard notation where the subscript
indicates that this is the PMF of the random variable X .
I Example: I toss a fair coin twice and let X be defined as the
number of heads I observe.
I Find the range of X ,RX , as well the probability mass function
PX .
Probability Mass Function (PMF)

Properties of PMF:
I 0 ≤ PX (x) ≤ 1 for all x;
I
P
x∈RX PX (x) = 1;
I for any set A ⊂ RX , P(X ∈ A) = x∈A PX (x).
P
Independent Random Variables

I The concept of independent random variables is very similar


to independent events.
I A and B are independent if P(A, B) = P(A)P(B).
I We have the definition of independent random variables

Consider two discrete random variables X and Y . We


say that X and Y are independent if
 
P X = x, Y = y = P(X = x)P(Y = y ), ∀ x,y.

In general, if two random variables are independent, then


you can write
 
P X ∈ A, Y ∈ B = P(X ∈ A)P(Y ∈ B).
Independent Random Variables

I Intuitively, two random variables X and Y are independent if


knowing the value of one does not change the probabilities for
the other one, i.e.
P(Y = y |X = x) = P(Y = y ), ∀ x,y
I In general

Consider n discrete random variables X1 , X2 , X3 , ..., Xn .


We say that X1 , X2 , X3 , ..., Xn are independent if
 
P X1 = x1 , X2 = x2 , ..., Xn = xn

= P(X1 = x1 )P(X2 = x2 )...P(Xn = xn )

for all x1 , x2 , ..., xn


Independent Random Variables

I It is possible to argue that two random variables are


independent simply because they do not have any physical
interactions with each other.
I Example: I toss a coin and define X to be the number of
heads I observe. Then I toss the coin two more times and
define Y to be the number of heads that I observe this time.
I What is probability P((X < 2) ∩ (Y > 1))?
Special Distributions

I Specific distributions that are used over and over in practice.


I Important to understand the random experiment associated
with each of these distributions.
I Since these random experiments model a lot of real life
phenomena, they are used in different applications.
I Rather than trying to memorize the PMF, one should
understand the random experiment and derive the PMF from
it.
Bernoulli Distribution

I Bernoulli random variable is a random variable that can only


take two possible values 0 and 1.
I Used to model random experiments that have two possible
outcomes, sometimes referred to as success and failure.
I Simple examples include
I You take a pass fail exam. You either pass(X = 1) or
fail(X = 0).
I You toss a coin. The outcome is either heads or tails.
I A child is born. The gender is either male or female.
Bernoulli Distribution

I In general

A random variable X is said to be a Bernoulli random


variable with parameter p, shown as X ∼ Bernoulli(p),
if its PMF is given by

 p for x = 1
PX (x) = 1−p for x = 0
0 otherwise

where 0 < p < 1.


Bernoulli Distribution

I The PMF of a Bernoulli(p) random variable


X ∼ Bernoulli(p)
PX (x)

1−p •

p •

0 1 x

Figure: PMF of a Bernoulli(p) random variable


Bernoulli Distribution

I Bernoulli random variable is also called the indicator random


variable.
I The indicator random variable IA for an event A is defined as

1 if the event A occurs
IA =
0 otherwise

Indicator random variable for an event A has Bernoulli


distribution with parameter p = P(A), so we can write

IA ∼ Bernoulli P(A) .
Geometric Distribution

I Suppose you have coin with P(H) = p. You toss the coin till
you observe the first heads.
I If we define the random variable X as the total number of
coin tosses in the experiment, then X is a geometric
distribution with parameter p.
I In this case RX = {1, 2, 3, ...} and

PX (k) = P(X = k) = (1 − p)k−1 p, for k = 1, 2, 3, ...

I You can think of this experiment as repeating Bernoulli trials


until observing the first success.
Geometric Distribution

I In general

A random variable X is said to be a geometric random


variable with parameter p, shown as X ∼ Geometric(p),
if its PMF is given by

p(1 − p)k−1

for k = 1, 2, 3, ...
PX (k) =
0 otherwise

where 0 < p < 1.


Geometric Distribution

I Some books also define geometric random variable X as the


total number of failures before observing the first success.
I In this case RX = {0, 1, 2, ...} and the PMF is given as

p(1 − p)k

for k = 0, 1, 2, 3, ...
PX (k) =
0 otherwise
Geometric Distribution

I The PMF of a Geometric random variable with parameter p.


X ∼ Geometric(p = 0.3)
PX (x)

0.30 •

0.25


0.20

0.15 •

0.10 •


0.05 •

• •
• •
• • • • • • • • • •
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x

Figure: PMF of a Geometric(0.3) random variable


Binomial Distribution

I Suppose you have a coin with P(H) = p.


I You toss the coin n times and define X to be the total
number of heads you observe.
I Then X is a Binomial random variable with parameter n and
p.
I In this case RX = {0, 1, 2, ..., n} and
 
n k
PX (k) = p (1 − p)n−k , fork = 0, 1, 2, ..., n.
k
Binomial Distribution

I In general

A random variable X is said to be a binomial ran-


dom variable with parameters n and p, shown as X ∼
Binomial(n, p), if its PMF is given by
 n k n−k
PX (k) = k p (1 − p) for k = 0, 1, .., n
0 otherwise

where 0 < p < 1.


Binomial Distribution

I The PMF of a Binomial(10,0.3) random variable.


X ∼ Binomial(n = 10, p = 0.3)
PX (x)


0.25

0.20 •

0.15


0.10 •

0.05



• • •
0 1 2 3 4 5 6 7 8 9 10 x

Figure: PMF of a Binomial(10, 0.3) random variable


Binomial Distribution

I The PMF of a Binomial(20,0.6) random variable.


X ∼ Binomial(n = 20, p = 0.6)
PX (x)



0.15

• •
0.10

• •
0.05


• •
• • • • • • • • • •
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x

Figure: PMF of a Binomial(20, 0.6) random variable


Binomial RV as a Sum of Bernoulli RVs

I A Binomial(n,p) random variable can be obtained by n


independent coin tosses.
I If each coin toss is a Bernoulli(p), then the Binomial(n,p)
random variable is a sum of n independent Bernoulli random
variables.
I We have following lemma
Lemma: If X1 , X2 , ..., Xn are independent Bernoulli(p)
random variables, then the random variable X defined by
X = X1 + X2 + ... + Xn has a Binomial(n, p) distribution.
Binomial Distribution

I Example: Let X ∼ Binomial(n, p) and Y ∼ Binomial(m, p)


be two independent random variables.
I Define random variable Z = X + Y . Find the PMF of Z .
Negative Binomial (Pascal) Distribution

I Generalization of the geometric distribution.


I Relates to the random experiment of repeated independent
trials until observing m successes.
I Suppose you have a coin with P(H) = p and you toss it until
you observe m heads, where m ∈ N.
I We define X as the total number of coin tosses in this
experiment. X is said to have Pascal distribution with
parameter m and p.
I The range of X is RX = {m, m + 1, m + 2, m + 3, ...}.
I Note: Pascal(1,p)=Geometric(p).
Negative Binomial (Pascal) Distribution

I In general

A random variable X is said to be a Pascal random


variable with parameters m and p, shown as X ∼
Pascal(m, p), if its PMF is given by
 k−1  m k−m
PX (k) = m−1 p (1 − p) ∀k ∈ RX
0 otherwise

where 0 < p < 1.


Negative Binomial (Pascal) Distribution

I We want PMF of a Pascal(m,p) random variable.


I Using the definitions of the experiment from before, to find
P(A = {X = k}) we can write A = B ∩ C where
I B is the event that we observe m − 1 heads in the first k − 1
trials, and
I C is the event that we observe a heads in the kth trial.
I B and C are independent, thus P(B ∩ C ) = P(B)P(C ).
I We have P(C ) = p and using the binomial formula we have,
 
k − 1 m−1

(k−1)−(m−1)
P(B) = p (1 − p) .
m−1
I Combining the two gives us the general formula for P(A).
Negative Binomial (Pascal) Dsitribution

I PMF of a Pascal(m,p) random variable with m = 3 and


p = 0.5.
X ∼ Pascal(m = 3, p = 0.5)
PX (x)

0.20
• •


0.15



0.10


0.05



• •
• • • • • • • • •
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x

Figure: PMF of a Pascal(3, 0.5) (negative binomial) random variable


Hypergeometric Distribution

I You have a bag that contains b blue marbles and r red


marbles.
I Choose k ≤ b + r marbles at random without replacement.
I Let X be the number of blue marbles in your sample.
I By definition, X ≤ min(k, b) and number of red marbles in
your sample must be less than or equal to r , so we have
X ≥ max(0, k − r ).
I RX = {max(0, k − r ), max(0, k − r ) + 1, max(0, k − r ) +
2, ..., min(k, b)}.
I Total number of ways to choose k marbles from b + r marbles
is b+r

k .
Hypergeometric Distribution

I The number of ways to choose x blue marbles and k − x red


marbles is xb k−x
 r 
.
I In general we have

A random variable X is said to be a Hypergeometric


random variable with parameters b, r and k, shown as
X ∼ Hypergeometric(b, r , k), if its range is RX , and its
PMF is given by
 b r
 (x )(k−x )
for x ∈ RX
PX (x) = (b+r
k )
 0 otherwise
Poisson Distribution

I Very widely used probability distribution.


I Used in counting the occurrences of certain events in an
interval of time or space.
I Suppose we are counting the number of customers who visit a
certain store from 1pm to 2pm.
I Based on data from previous days, we know that on average
λ = 15 customers visit the store.
I We can model the random variable X showing the number of
customers as Poisson random variable with parameter λ = 15.
Poisson Distribution

I In general

A random variable X is said to be a Poisson random


variable with parameter λ, shown as X ∼ Poisson(λ), if
its range is RX = {0, 1, 2, 3, ...}, its PMF is given by
e −λ λk

k! for k ∈ RX
PX (k) =
0 otherwise
Poisson Distribution

I The PMF of a Poisson random variable with λ = 1


X ∼ Poisson(λ = 1)
PX (x)
• •
0.35

0.30

0.25

0.20

0.15

0.10


0.05


• • • • • • • • • • • • • • • •
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x

Figure: PMF of a Poisson(1) random variable


Poisson Distribution

I The PMF of a Poisson random variable with λ = 10


X ∼ Poisson(λ = 10)
PX (x)

0.15
• •
• •
0.10 •



0.05 •
• •
• •
• • •
• • • • •
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x

Figure: PMF of a Poisson(10) random variable


Poisson Distribution

I Example: The number of emails that I get in a weekday can


be modeled by a Poisson distribution with an average of 0.2
emails per minute.
I What is the probability that I get no emails in an interval of
length 5 minutes?
I What is the probability that I get more than 3 emails in an
interval of length 10 minutes?
Poisson Distribution

I The Poisson distribution can be viewed as an approximation


of the binomial distribution.
I This is useful as the Poisson PMF is much easier to compute
than the binomial.
I Thus we have
Theorem: Let X ∼ Binomial(n, p = λn ), where λ > 0 is fixed.
Then for any k ∈ {0, 1, 2, ...}, we have

e −λ λk
lim PX (k) = .
n→∞ k!
Cumulative Distribution Function (CDF)

I Cumulative distribution function (CDF) of a random variable


is a method to describe the distribution of random variables.
I The advantage of CDF is that it can be defined for any kind
of random variable; while PMF cannot be defined for a
continuous random variable.
I We have

Definition:The cumulative distribution function (CDF)


of random variable X is defined as

FX (x) = P(X ≤ x), for all x ∈ R.


Cumulative Distribution Function (CDF)

I In general, let X be a discrete random variable with range


RX = {x1 , x2 , x3 , ...}, such that x1 < x2 < x3 < ....
I FX (x) = 0 for x < x1 . Note that the CDF starts at 0, i.e.,
FX (−∞) = 0.
I CDF is in the form of a staircase. It jumps at each point in
the range.
I CDF stays flat between xk and xk+1 , so we can write

FX (x) = FX (xk ), for xk ≤ x < xk+1 .

I CDF jumps at each xk . We can write

FX (xk ) − FX (xk − ) = PX (xk ), for  > 0 small enough.


Cumulative Distribution Function (CDF)
I CDF is always a non-decreasing function, i.e., if y ≥ x then
FY (y ) ≥ FX (x).
I It approaches 1 as x becomes large. We can write

lim FX (x) = 1.
x→∞

I If we have the PMF, we can calculate the CDF. If


RX = {x1 , x2 , x3 , ...}, then
X
FX (x) = PX (xk ).
xk ≤x

I We have a useful formula that


For all a ≤ b, we have

P(a < X ≤ b) = FX (b) − FX (a)


Cumulative Distribution Function (CDF)

I The CDF of a discrete random variable.


lim FX (x) = 1
n→∞
...
PX (xk )

.
FX (x) = 0 ..
for x < x1

PX (x2 )

PX (x1 )
...
x1 x2 x3 xk ... x

Figure: CDF of a discrete random variable


Cumulative Distribution Function

I Let X be a discrete random variable with range


RX = {1, 2, 3, ...}. Suppose the PMF of X is given by
1
PX (k) = 2k
I Find and plot the CDF of X , FX (x)?
I Find P(2 < X ≤ 5)?
I Find P(X > 4)?
Expectation

I The average of a random variable X is otherwise called its


expected value or mean.
I The expected value is defined as the weighted average of the
values in the range.

Expected value of X : EX = E [X ] = E (X ) = µX Def-


inition:Let X be a discrete random variable with range
RX = {x1 , x2 , x3 , ...} (finite or countably infinite). The
expected value of X , is defined as
X X
EX = xk P(X = xk ) = xk PX (xk ).
xk ∈RX xk ∈RX
Expectation

I The expectation for different special distributions


I X ∼ Bernoulli(p): EX = p.
I X ∼ Geometric(p): EX = p1 .
I X ∼ Poisson(λ): EX = λ.
I X ∼ Binomial(n, p): EX = np.
I X ∼ Pascal(m, p): EX = mp .
I kb
X ∼ Hypergeometric(b, r , k): EX = b+r .
Expectation

I Expectation is linear:

Definition:We have
I E [aX + b] = aEX + b, for all a, b ∈ R;
I E [X1 + X2 + · · · + Xn ] = EX1 + EX2 + · · · + EXn , for
any set of random variables X1 , X2 , · · · , Xn .

I We can use the linearity of expectation to easily calculate the


expected value of binomial and Pascal distributions.
Functions of Random Variables

I If X is a random variable and Y = g (X ), then Y itself is a


random variable.
I We can thus define its PMF,CDF and expected value.
I The range of Y, RY is given as
RY = {g (x)|x ∈ RX }
I If we knew the PMF of X , we can obtain the PMF of Y as

PY (y ) = P(Y = y )
= P(g (X ) = y )
X
= PX (x).
x:g (x)=y
Functions of Random Variables

I Example: Let X be a discrete random variable with


PX (k) = 51 for k = −1, 0, 1, 2, 3. Let Y = 2|X |.
I Find the range and PMF of Y .
Expected Value of a Function of a Random Variable -
LOTUS

I Let X be a discrete random variable and Y = g (X ).


I To calculate the expected value of Y , we can use LOTUS.

Law of the unconscious statistician (LOTUS) for discrete


random variables:
X
E [g (X )] = g (xk )PX (xk )
xk ∈RX
Expected Value of a Function of a Random Variable -
LOTUS

I Let X be a discrete random variable with range


RX = {0, π4 , π2 , 3π
4 , π}, such that
PX (0) = PX ( π4 ) = PX ( π2 ) = PX ( 3π 1
4 ) = PX (π) = 5 .
I Find E [sin(X )].
Variance

I Consider random variables X and Y with PMFs



 0.5 for x = −100
PX (x) = 0.5 for x = 100
0 otherwise


1 for y = 0
PY (y ) =
0 otherwise
I EX = EY = 0 but the two distributions are very different.
I Variance is a measure of how spread out the distribution of a
random variable is.
I Variance of Y is quite small as the distribution is
concentrated, while the variance of X is larger.
Variance

The variance of a random variable X , with mean EX = µX ,


is defined as

Var(X ) = E (X − µX )2 .
 

Computational formula for the variance:

   2
Var(X ) = E X 2 − EX (1)

X
where E [X 2 ] = EX 2 = xk2 PX (xk ).
xk ∈RX
Standard Deviation and Useful Results

The standard deviation of a random variable X is defined


as
p
SD(X ) = σX = Var(X ).

Theorem: Variance is not a linear operator. For a random


variable X and real numbers a and b,

Var(aX + b) = a2 Var(X )
Standard Deviation and Useful Results

Theorem: If X1 , X2 , · · · , Xn are independent random vari-


ables and X = X1 + X2 + · · · + Xn , then

Var(X ) = Var(X1 ) + Var(X2 ) + · · · + Var(Xn )

I Example: If X ∼ Binomial(n, p), find Var (X ).

You might also like