DSC511: Statistical Foundations for Data
Science
Discrete Random Variable
Bakkyaraj T
Indian Institute of Information Technology Kottayam
Email:[email protected]
Randaom Variable
I To analyze random experiments, we focus on numerical
aspects of the experiment.
I For example, if an entire soccer game is a random experiment,
then numerical results like goals, shots, fouls, etc are random
variables.
I Random Variable is a real valued variable whose value is
determined by an underlying random experiment.
Random Variable- A Simple Example
I I toss a coin five times. This is a random experiment.
I The sample space is
S = {TTTTT , TTTTH, ..., HHHHH}.
I Say we are interested in the number of heads.
I We define random variable X whose value is the number of
observed heads.
I X can take values 0,1,2,3,4 or 5 depending on the outcomes
of the experiment.
I For example, X = 0 for the outcome TTTTT and X = 2 for
THTHT .
I Thus we see X is a function from the space S to real numbers.
Important Definitions
Random Variables: A random variable X is a function from
the sample space to the real numbers.
X :S →R
The range of a random variable X , shown by Range(X ) or
RX , is the set of possible values of X .
Discrete Random Variables
I Two important classes of random variables- discrete and
continuous.
I A third class mixed random variables can be thought of as a
mixture of discrete and continuous random variables.
I We define a discrete random variable as the following
X is a discrete random variable if its range is countable.
I A set A is countable if either
I A is a finite set such as {1, 2, 3, 4}, or
I it can be put in one-to-one correspondence with natural
numbers (in this case, the set is said to be countably infinite).
Probability Mass Function (PMF)
I A event A = {X = xk } is defined as the set of outcomes s in
the sample space S for which the value of X is xk .
A = {s ∈ S|X (s) = xk }.
I The probabilities of events {X = xk } is given by the
probability mass function (PMF) of X.
Let X be a discrete random variable with range RX =
{x1 , x2 , x3 , ...} (finite or countably infinite). The function
PX (xk ) = P(X = xk ), for k = 1, 2, 3, ...,
is called the probability mass function (PMF) of X .
Probability Mass Function (PMF)
I The PMF is a probability measure that gives us probabilities
of the possible values of a random variable.
I We use PX as the standard notation where the subscript
indicates that this is the PMF of the random variable X .
I Example: I toss a fair coin twice and let X be defined as the
number of heads I observe.
I Find the range of X ,RX , as well the probability mass function
PX .
Probability Mass Function (PMF)
Properties of PMF:
I 0 ≤ PX (x) ≤ 1 for all x;
I
P
x∈RX PX (x) = 1;
I for any set A ⊂ RX , P(X ∈ A) = x∈A PX (x).
P
Independent Random Variables
I The concept of independent random variables is very similar
to independent events.
I A and B are independent if P(A, B) = P(A)P(B).
I We have the definition of independent random variables
Consider two discrete random variables X and Y . We
say that X and Y are independent if
P X = x, Y = y = P(X = x)P(Y = y ), ∀ x,y.
In general, if two random variables are independent, then
you can write
P X ∈ A, Y ∈ B = P(X ∈ A)P(Y ∈ B).
Independent Random Variables
I Intuitively, two random variables X and Y are independent if
knowing the value of one does not change the probabilities for
the other one, i.e.
P(Y = y |X = x) = P(Y = y ), ∀ x,y
I In general
Consider n discrete random variables X1 , X2 , X3 , ..., Xn .
We say that X1 , X2 , X3 , ..., Xn are independent if
P X1 = x1 , X2 = x2 , ..., Xn = xn
= P(X1 = x1 )P(X2 = x2 )...P(Xn = xn )
for all x1 , x2 , ..., xn
Independent Random Variables
I It is possible to argue that two random variables are
independent simply because they do not have any physical
interactions with each other.
I Example: I toss a coin and define X to be the number of
heads I observe. Then I toss the coin two more times and
define Y to be the number of heads that I observe this time.
I What is probability P((X < 2) ∩ (Y > 1))?
Special Distributions
I Specific distributions that are used over and over in practice.
I Important to understand the random experiment associated
with each of these distributions.
I Since these random experiments model a lot of real life
phenomena, they are used in different applications.
I Rather than trying to memorize the PMF, one should
understand the random experiment and derive the PMF from
it.
Bernoulli Distribution
I Bernoulli random variable is a random variable that can only
take two possible values 0 and 1.
I Used to model random experiments that have two possible
outcomes, sometimes referred to as success and failure.
I Simple examples include
I You take a pass fail exam. You either pass(X = 1) or
fail(X = 0).
I You toss a coin. The outcome is either heads or tails.
I A child is born. The gender is either male or female.
Bernoulli Distribution
I In general
A random variable X is said to be a Bernoulli random
variable with parameter p, shown as X ∼ Bernoulli(p),
if its PMF is given by
p for x = 1
PX (x) = 1−p for x = 0
0 otherwise
where 0 < p < 1.
Bernoulli Distribution
I The PMF of a Bernoulli(p) random variable
X ∼ Bernoulli(p)
PX (x)
1−p •
p •
0 1 x
Figure: PMF of a Bernoulli(p) random variable
Bernoulli Distribution
I Bernoulli random variable is also called the indicator random
variable.
I The indicator random variable IA for an event A is defined as
1 if the event A occurs
IA =
0 otherwise
Indicator random variable for an event A has Bernoulli
distribution with parameter p = P(A), so we can write
IA ∼ Bernoulli P(A) .
Geometric Distribution
I Suppose you have coin with P(H) = p. You toss the coin till
you observe the first heads.
I If we define the random variable X as the total number of
coin tosses in the experiment, then X is a geometric
distribution with parameter p.
I In this case RX = {1, 2, 3, ...} and
PX (k) = P(X = k) = (1 − p)k−1 p, for k = 1, 2, 3, ...
I You can think of this experiment as repeating Bernoulli trials
until observing the first success.
Geometric Distribution
I In general
A random variable X is said to be a geometric random
variable with parameter p, shown as X ∼ Geometric(p),
if its PMF is given by
p(1 − p)k−1
for k = 1, 2, 3, ...
PX (k) =
0 otherwise
where 0 < p < 1.
Geometric Distribution
I Some books also define geometric random variable X as the
total number of failures before observing the first success.
I In this case RX = {0, 1, 2, ...} and the PMF is given as
p(1 − p)k
for k = 0, 1, 2, 3, ...
PX (k) =
0 otherwise
Geometric Distribution
I The PMF of a Geometric random variable with parameter p.
X ∼ Geometric(p = 0.3)
PX (x)
0.30 •
0.25
•
0.20
0.15 •
0.10 •
•
0.05 •
•
• •
• •
• • • • • • • • • •
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x
Figure: PMF of a Geometric(0.3) random variable
Binomial Distribution
I Suppose you have a coin with P(H) = p.
I You toss the coin n times and define X to be the total
number of heads you observe.
I Then X is a Binomial random variable with parameter n and
p.
I In this case RX = {0, 1, 2, ..., n} and
n k
PX (k) = p (1 − p)n−k , fork = 0, 1, 2, ..., n.
k
Binomial Distribution
I In general
A random variable X is said to be a binomial ran-
dom variable with parameters n and p, shown as X ∼
Binomial(n, p), if its PMF is given by
n k n−k
PX (k) = k p (1 − p) for k = 0, 1, .., n
0 otherwise
where 0 < p < 1.
Binomial Distribution
I The PMF of a Binomial(10,0.3) random variable.
X ∼ Binomial(n = 10, p = 0.3)
PX (x)
•
0.25
•
0.20 •
0.15
•
0.10 •
0.05
•
•
•
• • •
0 1 2 3 4 5 6 7 8 9 10 x
Figure: PMF of a Binomial(10, 0.3) random variable
Binomial Distribution
I The PMF of a Binomial(20,0.6) random variable.
X ∼ Binomial(n = 20, p = 0.6)
PX (x)
•
•
•
0.15
• •
0.10
• •
0.05
•
•
• •
• • • • • • • • • •
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x
Figure: PMF of a Binomial(20, 0.6) random variable
Binomial RV as a Sum of Bernoulli RVs
I A Binomial(n,p) random variable can be obtained by n
independent coin tosses.
I If each coin toss is a Bernoulli(p), then the Binomial(n,p)
random variable is a sum of n independent Bernoulli random
variables.
I We have following lemma
Lemma: If X1 , X2 , ..., Xn are independent Bernoulli(p)
random variables, then the random variable X defined by
X = X1 + X2 + ... + Xn has a Binomial(n, p) distribution.
Binomial Distribution
I Example: Let X ∼ Binomial(n, p) and Y ∼ Binomial(m, p)
be two independent random variables.
I Define random variable Z = X + Y . Find the PMF of Z .
Negative Binomial (Pascal) Distribution
I Generalization of the geometric distribution.
I Relates to the random experiment of repeated independent
trials until observing m successes.
I Suppose you have a coin with P(H) = p and you toss it until
you observe m heads, where m ∈ N.
I We define X as the total number of coin tosses in this
experiment. X is said to have Pascal distribution with
parameter m and p.
I The range of X is RX = {m, m + 1, m + 2, m + 3, ...}.
I Note: Pascal(1,p)=Geometric(p).
Negative Binomial (Pascal) Distribution
I In general
A random variable X is said to be a Pascal random
variable with parameters m and p, shown as X ∼
Pascal(m, p), if its PMF is given by
k−1 m k−m
PX (k) = m−1 p (1 − p) ∀k ∈ RX
0 otherwise
where 0 < p < 1.
Negative Binomial (Pascal) Distribution
I We want PMF of a Pascal(m,p) random variable.
I Using the definitions of the experiment from before, to find
P(A = {X = k}) we can write A = B ∩ C where
I B is the event that we observe m − 1 heads in the first k − 1
trials, and
I C is the event that we observe a heads in the kth trial.
I B and C are independent, thus P(B ∩ C ) = P(B)P(C ).
I We have P(C ) = p and using the binomial formula we have,
k − 1 m−1
(k−1)−(m−1)
P(B) = p (1 − p) .
m−1
I Combining the two gives us the general formula for P(A).
Negative Binomial (Pascal) Dsitribution
I PMF of a Pascal(m,p) random variable with m = 3 and
p = 0.5.
X ∼ Pascal(m = 3, p = 0.5)
PX (x)
0.20
• •
•
0.15
•
•
0.10
•
•
0.05
•
•
•
• •
• • • • • • • • •
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x
Figure: PMF of a Pascal(3, 0.5) (negative binomial) random variable
Hypergeometric Distribution
I You have a bag that contains b blue marbles and r red
marbles.
I Choose k ≤ b + r marbles at random without replacement.
I Let X be the number of blue marbles in your sample.
I By definition, X ≤ min(k, b) and number of red marbles in
your sample must be less than or equal to r , so we have
X ≥ max(0, k − r ).
I RX = {max(0, k − r ), max(0, k − r ) + 1, max(0, k − r ) +
2, ..., min(k, b)}.
I Total number of ways to choose k marbles from b + r marbles
is b+r
k .
Hypergeometric Distribution
I The number of ways to choose x blue marbles and k − x red
marbles is xb k−x
r
.
I In general we have
A random variable X is said to be a Hypergeometric
random variable with parameters b, r and k, shown as
X ∼ Hypergeometric(b, r , k), if its range is RX , and its
PMF is given by
b r
(x )(k−x )
for x ∈ RX
PX (x) = (b+r
k )
0 otherwise
Poisson Distribution
I Very widely used probability distribution.
I Used in counting the occurrences of certain events in an
interval of time or space.
I Suppose we are counting the number of customers who visit a
certain store from 1pm to 2pm.
I Based on data from previous days, we know that on average
λ = 15 customers visit the store.
I We can model the random variable X showing the number of
customers as Poisson random variable with parameter λ = 15.
Poisson Distribution
I In general
A random variable X is said to be a Poisson random
variable with parameter λ, shown as X ∼ Poisson(λ), if
its range is RX = {0, 1, 2, 3, ...}, its PMF is given by
e −λ λk
k! for k ∈ RX
PX (k) =
0 otherwise
Poisson Distribution
I The PMF of a Poisson random variable with λ = 1
X ∼ Poisson(λ = 1)
PX (x)
• •
0.35
0.30
0.25
0.20
•
0.15
0.10
•
0.05
•
• • • • • • • • • • • • • • • •
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x
Figure: PMF of a Poisson(1) random variable
Poisson Distribution
I The PMF of a Poisson random variable with λ = 10
X ∼ Poisson(λ = 10)
PX (x)
0.15
• •
• •
0.10 •
•
•
•
0.05 •
• •
• •
• • •
• • • • •
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x
Figure: PMF of a Poisson(10) random variable
Poisson Distribution
I Example: The number of emails that I get in a weekday can
be modeled by a Poisson distribution with an average of 0.2
emails per minute.
I What is the probability that I get no emails in an interval of
length 5 minutes?
I What is the probability that I get more than 3 emails in an
interval of length 10 minutes?
Poisson Distribution
I The Poisson distribution can be viewed as an approximation
of the binomial distribution.
I This is useful as the Poisson PMF is much easier to compute
than the binomial.
I Thus we have
Theorem: Let X ∼ Binomial(n, p = λn ), where λ > 0 is fixed.
Then for any k ∈ {0, 1, 2, ...}, we have
e −λ λk
lim PX (k) = .
n→∞ k!
Cumulative Distribution Function (CDF)
I Cumulative distribution function (CDF) of a random variable
is a method to describe the distribution of random variables.
I The advantage of CDF is that it can be defined for any kind
of random variable; while PMF cannot be defined for a
continuous random variable.
I We have
Definition:The cumulative distribution function (CDF)
of random variable X is defined as
FX (x) = P(X ≤ x), for all x ∈ R.
Cumulative Distribution Function (CDF)
I In general, let X be a discrete random variable with range
RX = {x1 , x2 , x3 , ...}, such that x1 < x2 < x3 < ....
I FX (x) = 0 for x < x1 . Note that the CDF starts at 0, i.e.,
FX (−∞) = 0.
I CDF is in the form of a staircase. It jumps at each point in
the range.
I CDF stays flat between xk and xk+1 , so we can write
FX (x) = FX (xk ), for xk ≤ x < xk+1 .
I CDF jumps at each xk . We can write
FX (xk ) − FX (xk − ) = PX (xk ), for > 0 small enough.
Cumulative Distribution Function (CDF)
I CDF is always a non-decreasing function, i.e., if y ≥ x then
FY (y ) ≥ FX (x).
I It approaches 1 as x becomes large. We can write
lim FX (x) = 1.
x→∞
I If we have the PMF, we can calculate the CDF. If
RX = {x1 , x2 , x3 , ...}, then
X
FX (x) = PX (xk ).
xk ≤x
I We have a useful formula that
For all a ≤ b, we have
P(a < X ≤ b) = FX (b) − FX (a)
Cumulative Distribution Function (CDF)
I The CDF of a discrete random variable.
lim FX (x) = 1
n→∞
...
PX (xk )
.
FX (x) = 0 ..
for x < x1
PX (x2 )
PX (x1 )
...
x1 x2 x3 xk ... x
Figure: CDF of a discrete random variable
Cumulative Distribution Function
I Let X be a discrete random variable with range
RX = {1, 2, 3, ...}. Suppose the PMF of X is given by
1
PX (k) = 2k
I Find and plot the CDF of X , FX (x)?
I Find P(2 < X ≤ 5)?
I Find P(X > 4)?
Expectation
I The average of a random variable X is otherwise called its
expected value or mean.
I The expected value is defined as the weighted average of the
values in the range.
Expected value of X : EX = E [X ] = E (X ) = µX Def-
inition:Let X be a discrete random variable with range
RX = {x1 , x2 , x3 , ...} (finite or countably infinite). The
expected value of X , is defined as
X X
EX = xk P(X = xk ) = xk PX (xk ).
xk ∈RX xk ∈RX
Expectation
I The expectation for different special distributions
I X ∼ Bernoulli(p): EX = p.
I X ∼ Geometric(p): EX = p1 .
I X ∼ Poisson(λ): EX = λ.
I X ∼ Binomial(n, p): EX = np.
I X ∼ Pascal(m, p): EX = mp .
I kb
X ∼ Hypergeometric(b, r , k): EX = b+r .
Expectation
I Expectation is linear:
Definition:We have
I E [aX + b] = aEX + b, for all a, b ∈ R;
I E [X1 + X2 + · · · + Xn ] = EX1 + EX2 + · · · + EXn , for
any set of random variables X1 , X2 , · · · , Xn .
I We can use the linearity of expectation to easily calculate the
expected value of binomial and Pascal distributions.
Functions of Random Variables
I If X is a random variable and Y = g (X ), then Y itself is a
random variable.
I We can thus define its PMF,CDF and expected value.
I The range of Y, RY is given as
RY = {g (x)|x ∈ RX }
I If we knew the PMF of X , we can obtain the PMF of Y as
PY (y ) = P(Y = y )
= P(g (X ) = y )
X
= PX (x).
x:g (x)=y
Functions of Random Variables
I Example: Let X be a discrete random variable with
PX (k) = 51 for k = −1, 0, 1, 2, 3. Let Y = 2|X |.
I Find the range and PMF of Y .
Expected Value of a Function of a Random Variable -
LOTUS
I Let X be a discrete random variable and Y = g (X ).
I To calculate the expected value of Y , we can use LOTUS.
Law of the unconscious statistician (LOTUS) for discrete
random variables:
X
E [g (X )] = g (xk )PX (xk )
xk ∈RX
Expected Value of a Function of a Random Variable -
LOTUS
I Let X be a discrete random variable with range
RX = {0, π4 , π2 , 3π
4 , π}, such that
PX (0) = PX ( π4 ) = PX ( π2 ) = PX ( 3π 1
4 ) = PX (π) = 5 .
I Find E [sin(X )].
Variance
I Consider random variables X and Y with PMFs
0.5 for x = −100
PX (x) = 0.5 for x = 100
0 otherwise
1 for y = 0
PY (y ) =
0 otherwise
I EX = EY = 0 but the two distributions are very different.
I Variance is a measure of how spread out the distribution of a
random variable is.
I Variance of Y is quite small as the distribution is
concentrated, while the variance of X is larger.
Variance
The variance of a random variable X , with mean EX = µX ,
is defined as
Var(X ) = E (X − µX )2 .
Computational formula for the variance:
2
Var(X ) = E X 2 − EX (1)
X
where E [X 2 ] = EX 2 = xk2 PX (xk ).
xk ∈RX
Standard Deviation and Useful Results
The standard deviation of a random variable X is defined
as
p
SD(X ) = σX = Var(X ).
Theorem: Variance is not a linear operator. For a random
variable X and real numbers a and b,
Var(aX + b) = a2 Var(X )
Standard Deviation and Useful Results
Theorem: If X1 , X2 , · · · , Xn are independent random vari-
ables and X = X1 + X2 + · · · + Xn , then
Var(X ) = Var(X1 ) + Var(X2 ) + · · · + Var(Xn )
I Example: If X ∼ Binomial(n, p), find Var (X ).