0% found this document useful (0 votes)
4 views41 pages

Notes Unit- 2_Random Variable Probability Distribution

The document explains the concept of random variables, which can be classified into discrete and continuous types, each with their respective probability distributions. It details the characteristics of discrete random variables, such as the Binomial and Geometric distributions, and continuous random variables, including their probability density functions. Additionally, it discusses the importance of probability distributions in making predictions and drawing conclusions from sample data.

Uploaded by

akshay kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views41 pages

Notes Unit- 2_Random Variable Probability Distribution

The document explains the concept of random variables, which can be classified into discrete and continuous types, each with their respective probability distributions. It details the characteristics of discrete random variables, such as the Binomial and Geometric distributions, and continuous random variables, including their probability density functions. Additionally, it discusses the importance of probability distributions in making predictions and drawing conclusions from sample data.

Uploaded by

akshay kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit-2: Random variable Probability distribution

Random Variable
A random variable (RV) is a variable that takes numerical values based on the outcomes of
a random experiment. Generally it is denoted by X.

Random
variable

Discrete Continuous
random random
variable variable

Random
Variables
Discrete Continuous
RV RV

Binomial Geometric Poisson Normal Exponential Uniform


Distribution Distribution Distribution Distribution Distribution Distribution

• Discrete RV → Takes countable values.


Examples: Dead/alive, number on a die, Children in a family, No. of defective items in a
box, etc.
• Continuous RV → Takes uncountable (measurable) values.
Examples: Height, Weight, blood pressure, real numbers between 1 to 5 etc.

Discrete Random Variables


A discrete random variable takes on countable values (finite or countably infinite).
Examples:
1. Number of heads when flipping 3 coins → values: 0, 1, 2, 3
2. Number of students present in a class
3. Roll of a die → values: 1, 2, 3, 4, 5, 6
4. Number of calls received in an hour
5. Marks obtained in a multiple-choice quiz (like 0, 1, 2, 3...)

Continuous Random Variables


A continuous random variable takes on uncountably infinite values within an interval.
These are typically measured (not counted).
Examples:
1. Height of students in a class → e.g., 162.5 cm, 170.3 cm
2. Time taken to complete a test → e.g., 42.75 minutes
3. Weight of a person → e.g., 68.2 kg
4. Temperature in a room → e.g., 24.6°C

1
5. Distance travelled by a car in an hour → e.g., 60.5 km

Probability Distribution:

A probability distribution describes how probabilities are assigned to the values of a


random variable.

Why we need Probability Distributions?


Most important projects and scientific research studies are conducted with sample data rather
than with data from an entire population.
What is a Probability Distribution?
• It is a way to shape the sample data to make predictions and draw conclusions about
an entire population.
• It refers to the frequency at which some events or experiments occur.
• Probability Distribution helps finding all the possible values a random variable can
take between the minimum and maximum possible values.
• Probability Distributions are used to model real-life events for which the outcome is
uncertain.
Once we find the appropriate distribution, we can use it to make inferences and predictions.

You might be certain if you examine the whole population.


• But often times, you only have samples to work with.
• To draw conclusions from sample data, you should compare values obtained from the
sample with the theoretical values obtained from the probability distribution.

There will always be a risk of drawing false conclusions or making false predictions.
• We need to be sufficiently confident before taking any decision by setting confidence
levels.
• Often set at 90 percent, 95 percent or 99 percent.

2
1. Probability Distributions:
• Many probability distributions can be defined by factors such as the mean and
standard deviation of the data.
• Each probability distribution has a formula.
• There are different shapes, models and classifications of probability distributions.

They are often classified into two categories: (Depending on the nature of random variable):
• Discrete Probability Distributions.
• Continuous Probability Distributions.

1. Discrete Probability Distribution

Gives the probability of each possible value of a discrete random variable X.

Example-1:

Let X denotes the number of heads in 2 coin tosses.

Possible values of X: 0, 1, 2
Sample space is given by S= {HH, HT, TH, TT}

X (Heads) P(X=x) = Probability of X=x Favorable Number of Favorable


Outcomes outcomes
0 1/4 TT 1
1 2/4 HT, TH 2
2 1/4 HH 1

This table is the Probability Mass Function (PMF). This table satisfies:

• 0 ≤P(X=x)≤ 1
• ∑𝑥 P(X = x) = ∑i=n i=n
i=1 P(X = xi ) = ∑i=1 p(xi ) = 1

3
Example-2: Consider an example of tossing a fair coin 3 times.
Define Random variable X: The number of heads obtained.
i.e. X denotes the number of heads while a fair coin is tossed 3 times.

X= Number of 𝐏(𝐗 = 𝐱) Favorable Outcomes Number of Favorable


Heads outcomes

X=0 1/8 TTT 1

X=1 3/8 HTT, THT, TTH 3

X=2 3/8 HHT, HTH, THH 3

X=3 1/8 HHH 1

Probability
Distribution
Let X be a random variable then a probability function f(x) maps the possible values of X
against their respective probabilities of occurrence, 𝑃(𝑋 = 𝑥) = 𝑝(𝑥).

2. Continuous Probability Distribution


Gives probabilities over intervals, not individual points.
Described by a Probability Density Function (PDF), denoted by f(x).

Example-3:
Let X = Time taken to finish a task, is uniformly distributed between 0 and 1 hour.
1,0 ≤ x ≤ 1
f(x) = {
0, otherwise

Then,
0.5
P(0.2<X<0.5) = ∫0.2 f(x) dx =0.3

Note: For continuous RVs, 𝑃(𝑋 = 𝑎) = 0, but 𝑃(𝑎 < 𝑋 < 𝑏) > 0

Summary:
Random Variable Type Function Used Key Feature
Discrete R.V. PMF: P(X=x) Probabilities for exact values
Probabilities over intervals
Continuous R.V. PDF: f(x)
only

1. Discrete Probability Distributions (PMFs)

4
Bernoulli distribution, Binomial distribution, Negative Binomial distribution, Geometric
distribution, Hypergeometric distribution, Poisson distribution, etc.

1.1 Bernoulli Distribution: Bernoulli Distribution is the simplest discrete probability


distribution, used to model a random experiment if there is only 1 trial with two outcomes
as: success or failure (0 or 1), Head or Tail, Pass or Fail, Defective or Not defective, etc.
PMF (Probability Mass Function) of Bernoulli distribution is given by:
𝑃(𝑋 = 𝑥) = 𝑝 𝑥 (1 − 𝑝)1−𝑥 , 𝑥 = 0,1
𝑝, 𝑖𝑓 𝑥 = 1
i.e. 𝑃(𝑋 = 𝑥) = { where 0 ≤ 𝑝 ≤ 1
1 − 𝑝, 𝑖𝑓 𝑥 = 0
where 𝑝: probability of success
We write this as:
𝑋~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝)

1.2 Binomial Distribution: The Binomial Distribution is a discrete probability distribution that
models the number of successes in n (fixed number) independent Bernoulli trials.
If random variable X represents the number of successes in n independent Bernoulli trials.
Conditions for a Binomial Distribution:

A discrete random variable X follows a binomial distribution if:

1. The number of trials n is fixed.


2. Each trial is independent.
3. Each trial has only two outcomes: success (with probability p) or failure (with
probability 𝑞 = 1 − 𝑝).
4. The probability of success p remains the same in each trial.

Notation for Binomial distribution: X~Bin (n, p)


This means the random variable X follows a binomial distribution.

Probability Mass Function (PMF) of Binomial distribution:


The probability of getting exactly 𝑥 successes in n trials is given by:
𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 , 𝑥 = 0,1,2, … , 𝑛
𝑥
𝑛 𝑛!
Where ( ) = nCx = 𝑥!(𝑛−𝑥)!
𝑥
• Binomial distribution is fully defined if we know ‘n’ and ‘p’, so n and p are called
parameters of Binomial distribution. Where-
• n: number of trials
• p: probability of success
• 𝑞 = 1 − 𝑝: probability of failure
• Note that n is a discrete parameter whereas p is a continuous parameter as 0 <p< 1.

Parameters of Binomial distribution:


Mean of random variable X = 𝜇 = 𝐸(𝑋) = Expected value of r. v. X = 𝑛𝑝
2
Variance of random variable X = Var(X) =𝜎 2 = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝑛𝑝𝑞
Standard Deviation= 𝜎 = √𝑉𝑎𝑟(𝑋) = √𝑛𝑝𝑞
Proof: (Hint): Let 𝑋 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
Where, 𝑋~𝐵𝑖𝑛(𝑛, 𝑃) and 𝑋𝑖 ~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝); 𝑓𝑜𝑟 𝑖 = 1,2, … 𝑛

5
Example-1: A coin is tossed 5 times. What is the probability of getting exactly 3 heads?
Solution:

Example-2: Decide whether the experiments given below are a binomial experiment. If it is,
specify the values of n, p, and q, and list the possible values of the random variable X. If it is
not a binomial experiment, explain why?

(i). You randomly select a card from a deck of cards, and note if the card is a king. You then
put the card back and repeat this process 8 times.
Yes…This is a binomial experiment. Each of the 8 selections represent an independent trial
because the card is replaced before the next one is drawn. There are only two possible
outcomes: (i) Either the card is a king or (ii) The card is not a king.

(ii). You roll a die 10 times and note the number the die lands on.
No…. This is not a binomial experiment. While each trial (roll) is independent, there are
more than two possible outcomes: 1, 2, 3, 4, 5, and 6.

So, Binomial Distribution is a discrete probability distribution that is used for data which can
only take one of two values, i.e.
• Pass or fail.
• Yes or no.
• Good or defective.
Binomial Distribution allows to compute the probability of the number of successes for a
given number of trials.
Success could mean anything you want to consider as a positive or negative outcome.

Recurrence Relation for the Binomial Distribution: PMF for the Binomial distribution is:

6
𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥
𝑥
𝑛
𝑃(𝑋 = 𝑥 + 1) = ( ) 𝑝 𝑥+1 (1 − 𝑝)𝑛−𝑥−1
𝑥+1
𝑃(𝑋 = 𝑥 + 1) 𝑝(𝑥 + 1) 𝑛 − 𝑥 𝑝
= = . . 𝑃(𝑋 = 𝑥)
𝑃(𝑋 = 𝑥) 𝑝(𝑥) 𝑥+1 𝑞
𝑛−𝑥 𝑝
𝑃(𝑋 = 𝑥 + 1) = . . 𝑝(𝑥)
𝑥+1 𝑞

Examples-3: The mean and standard deviation of a binomial distribution are 5 and 2.
Determines the distribution.
Hint for Solution: Let X is a discrete random variable that follows a Binomial distribution.
That is X~Bin (n, p)
We know that Probability Mass Function (PMF) for the Binomial distribution is given by:
𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 , 𝑥 = 0,1,2, … , 𝑛
𝑥
Mean = 𝜇 = 𝐸(𝑥) = 𝑛𝑝
Variance of random variable X = Var(X) =𝜎 2 = 𝑛𝑝𝑞

Examples-4: The mean and variance of a binomial variate are 8 and 6. Find P(X ≥ 2).

Examples-5: With the usual notation, find p for a binomial distribution if n = 6 and 9P(X =
4) = P(X = 2).

Examples-6: Two dice are thrown five times. Find the probability of getting the sum as 7 (i)
at least once, (ii) two times, and (iii) 𝑃(1 < 𝑋 < 5).

Examples-7: The incidence of corona in an industry is such that the workers have a 20%
chance of suffering from it. What is the probability that out of 6 workers chosen at random,
four or more will suffer from the corona?

Examples-8: If hens of a certain breed lay eggs on 5 days a week on an average, find how
many days during a season of 100 days a will poultry keeper with 5 hens of this breed expect
to receive at least 4 eggs.

1.3 Negative Binomial Distribution:


Let random variable X represents the total number of required Bernoulli trials to get r
successes.
𝑥 − 1 𝑟 (1
𝑃(𝑋 = 𝑥) = ( )𝑝 − 𝑝)𝑛−𝑟 ,
𝑟−1
𝑥 = 𝑟, 𝑟 + 1, 𝑟 + 2, …

1.4 Geometric Distribution: If the random variable X counts the number of required
Bernoulli trials until 1st success occurs.
Let 𝑝 be the probability of success in each trial, and 𝑞 = 1 − 𝑝 be the probability of failure.
Then, the probability mass function (PMF) of random variable X is given by:
Probability of 𝑥 = 𝑃(𝑋 = 𝑥) = 𝑝(1 − 𝑝)𝑥−1 , 𝑥 = 1,2, …

Expected value of a Geometric random variable:

7
∞ ∞
𝑖−1
μ = E(X) = ∑ x. 𝑃(𝑋 = 𝑥) = ∑ i. p. (1 − 𝑝) = ∑ i. p. 𝑞 𝑖−1
x i=1 i=1
∞ ∞ ∞

= ∑(i − 1 + 1). 𝑞 𝑖−1 . 𝑝 = ∑(i − 1). 𝑞 𝑖−1 . 𝑝 + ∑ 𝑞 𝑖−1 . 𝑝


i=1 i=1 i=1
Putting 𝑖 − 1 = 𝑗
∞ ∞

E(X) = ∑ j. 𝑝. 𝑞 + 1 = 𝑞. ∑ j. 𝑝. 𝑞 𝑗−1 + 1 = 𝑞. 𝐸[𝑋] + 1


𝑗

i=1 i=1
1
(1 − q). E[X] = 1 this implies E[X] =
p
In other words, if independent trials having a common probability p of being successful are
performed until the first success occurs, then the expected number of required trials equals
1/p. For instance, the expected number of rolls of a fair die that it takes to obtain the value 1
is 6.

Variance of a geometric random variable is given by:


We know that 𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2
Now,
∞ ∞
2] 2 2 𝑥−1 2 𝑖−1
𝐸[𝑋 = ∑ 𝑥 . 𝑃(𝑋 = 𝑥) = ∑ 𝑥 . 𝑝. (1 − 𝑝) = ∑ 𝑖 . p. (1 − 𝑝) = ∑ 𝑖 2 . p. 𝑞 𝑖−1
x x i=1 i=1
∞ ∞ ∞ ∞

= ∑(𝑖 − 1 + 1)2 . p. 𝑞 𝑖−1 = ∑(𝑖 − 1)2 . p. 𝑞 𝑖−1 + 2 ∑(𝑖 − 1). p. 𝑞 𝑖−1 + ∑ p. 𝑞 𝑖−1
i=1 i=1 i=1 i=1
∞ ∞ ∞

= 𝑞. ∑ 𝑗 2 . p. 𝑞 𝑗−1 + 2 ∑ 𝑗. p. 𝑞 𝑗 + ∑ p. 𝑞 𝑗
j=0 j=1 j=0
= 𝑞. 𝐸[𝑋 2 ] + 2. 𝑞. 𝐸[𝑋] + 1
2. 𝑞
𝑝. 𝐸[𝑋 2 ] = +1
𝑝
2. 𝑞 1 2. 𝑞 + 𝑝 𝑞 + 1
𝐸[𝑋 2 ] = 2 + = =
𝑝 p 𝑝2 𝑝2
So,
𝑞+1 1 𝑞
𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2 = − =
𝑝2 𝑝2 𝑝2
1.5 Hypergeometric Distribution:
The hypergeometric distribution models the number of successes in a sample drawn
without replacement from a finite population.
We write:
𝑋~Hypergeometric(𝑁, 𝐾, 𝑛)
Where:

N: total number of items (population size)


K: number of successes in the population
n: number of items drawn (sample size)
X: number of successes in the sample

Then the PMF of Hypergeometric Distribution is given as:

8
(𝐾𝑥)(𝑁−𝐾
𝑛−𝑥
)
𝑃(𝑋 = 𝑥) = ; 𝑚𝑎𝑥(0, 𝑛 − (𝑁 − 𝐾)) ≤ 𝑥 ≤ 𝑚𝑖𝑛(𝑛, 𝐾)
(𝑁𝑛)
Explanation of the above PMF:
• (𝐾𝑥): ways to choose x successes from K successful items

• (𝑁−𝐾
𝑛−𝑥
): ways to choose the remaining n−x from the N−K failures

• (𝑁𝑛): total ways to choose n items from the population

Example-9: From a population of 20 items, 7 are defective. If 5 items are selected without
replacement, then:

Let X = Number of defective items in the sample.


Then 𝑋 ∼ 𝐻𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐 (𝑁 = 20, 𝐾 = 7, 𝑛 = 5)

Here, X is a discrete random variable because it can take only integer values like 0, 1, 2, ...,
up to 𝑚𝑖𝑛(𝐾, 𝑛) = 𝑚𝑖𝑛(7, 5) = 5.

Example-10: A box contains 10 balls: 4 red (successes) and 6 white (failures). 3 balls are
chosen at random without replacement.

Solution: Let X be the number of red balls chosen.

Then, N=10, K=4, n=3

PMF:

(𝑥4)(3−𝑥
6
)
𝑃(𝑋 = 𝑥) = ; 𝑥 = 0, 1, 2, 3
(10
3
)

1.6 Poisson Distribution: Poisson distribution is used when we wants to count the number of
events in a fixed interval.
λx e−λ
𝑃(𝑋 = 𝑥) = 𝑝(𝑥) = , 𝑥 = 0,1,2, … ∞
x!

Where 𝜆 is called the parameter of Poisson distribution.


Discrete Probability Distributions are used in Tossing coins (Bernoulli/Binomial), Modeling
number of trials until success (Geometric), Counting calls at a call center (Poisson).

The Poisson distribution holds under the following conditions:


(i) The random variable X should be discrete.
(ii) The numbers of trials n are very large.
(iii) The probability of success p is very small (very close to zero).
(iv) λ = np is finite.
(v) The occurrences are rare.

Examples of Poisson approximation:

9
• Number of defective bulbs produced by a reputed company.
• Number of telephone calls per minute.
• Number of cars passing a certain point in one minute.
• Number of printing mistakes per page in a large text.
• Number of persons born blind per year in a large city.

Observation: We can observe that Poisson distribution is the limiting case of Binomial
distribution.

Poisson Approximation to the Binomial Distribution:

Poisson distribution is the limiting case of Binomial distribution under the following
conditions:
(i) The number of trials should be infinitely large, i.e., 𝑛 → ∞.
(ii) The probability of successes p for each trial should be very small, i.e., p→ 0.
(iii) np = λ should be finite. Where λ is a constant.

Then, as 𝑛 → ∞ Binomial distribution tends to Poisson distribution.


Proof: Consider the PMF of Binomial distribution:
𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 , 𝑥 = 0,1,2, … , 𝑛
𝑥
𝑛 𝑝 𝑥
= ( )( ) (1 − 𝑝)𝑛
𝑥 1−𝑝
Now, let
𝜆
𝜆 = 𝑛. 𝑝 𝑆𝑜, 𝑝 =
𝑛 𝑥
𝜆
𝑛(𝑛 − 1)(𝑛 − 2)(𝑛 − 3) … (𝑛 − (𝑥 − 1)). (𝑛 − 𝑥)! 𝑛 𝜆 𝑛
𝑃(𝑋 = 𝑥) = .( 𝜆
) (1 − )
𝑥!. (𝑛 − 𝑥)! 1− 𝑛
𝑛
𝑥 𝑛
𝑛(𝑛 − 1)(𝑛 − 2)(𝑛 − 3) … (𝑛 − (𝑥 − 1)) 𝜆 1 𝜆
= .( ) 𝑥 (1 − )
𝑥! 𝑛 (1 − 𝜆 ) 𝑛
𝑛
𝑛(𝑛 − 1)(𝑛 − 2)(𝑛 − 3) … (𝑛 − (𝑥 − 1)) 𝜆𝑥 1 𝜆 𝑛
= . 𝑥 (1 − )
𝑥! 𝑛 (1 − 𝜆 )𝑥 𝑛
𝑛
𝑛 𝑛−1 𝑛−2 𝑛−3 𝑛−(𝑥−1) −𝑥
𝑛
( 𝑛
)( 𝑛
)( 𝑛
)…( 𝜆𝑛
)
𝑥
𝜆 𝑛
= . 𝜆 (1 − ) (1 − )
𝑥! 𝑛 𝑛
Now taking limits of both sides as 𝑛 → ∞
e−λ λ𝑥
𝑃(𝑋 = 𝑥) = ; 𝑥 = 0,1,2, … ∞, in this way, we have proved our claim.
𝑥!

Mean of the Poisson Distribution:


We know that,

Mean of X = μ = Expected value of X = 𝐸[𝑋] = ∑ 𝑥. 𝑝(𝑥)


𝑥=0
∞ ∞ 𝑥 −λ ∞ 𝑥−1 ∞ 𝑥−1
λ e λ. λ λ
𝐸[𝑋] = ∑ 𝑥. 𝑝(𝑥) = ∑ 𝑥. = e− λ ∑ = λ. e−λ ∑ =λ
𝑥! (𝑥 − 1)! (𝑥 − 1)!
𝑥=0 𝑥=0 𝑥=0 𝑥=1

10
So, Mean = λ

Variance of the Poisson Distribution:


∞ ∞
2]
λx e−λ
𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 − (𝐸[𝑋])2 = ∑ 𝑥 . 𝑝(𝑥) − 𝜇 = ∑ 𝑥 2 .
2 2
− λ2
x!
𝑥=0 𝑥=0
∞ x −λ ∞ x −λ
λ e λ e
= ∑(𝑥 2 − 𝑥 + 𝑥). − λ2 = ∑(𝑥(𝑥 − 1) + 𝑥). − λ2
x! x!
𝑥=0 𝑥=0
∞ x −λ ∞ x −λ ∞ ∞
λ e λ e 𝜆 𝑥 𝑒 −𝜆 𝜆 𝑥 𝑒 −𝜆
= ∑ 𝑥(𝑥 − 1). + ∑ 𝑥. − λ2 = ∑ +∑ − 𝜆2
x! x! (𝑥 − 2)! (𝑥 − 1)!
𝑥=0 𝑥=0 𝑥=0 𝑥=0
∞ ∞ ∞ ∞
−𝜆
𝜆2 . 𝜆𝑥−2 𝜆. 𝜆𝑥−1 2 −𝜆 2
𝜆𝑥−2 𝜆𝑥−1
= 𝑒 [∑ +∑ ] − 𝜆 = 𝑒 [𝜆 ∑ +𝜆∑ ] − 𝜆2
(𝑥 − 2)! (𝑥 − 1)! (𝑥 − 2)! (𝑥 − 1)!
𝑥=0 𝑥=0 𝑥=0 𝑥=0
= 𝑒 −𝜆 [𝜆2 𝑒 𝜆 + 𝜆𝑒 𝜆 ] − 𝜆2 = 𝜆2 + 𝜆 − 𝜆2 = 𝜆

So, Var(X) = λ
Standard deviation =√variance = √𝜆

Recurrence Relation for the Poisson Distribution:


As we discussed, for the Poisson distribution:
λ𝑥 e−λ λ𝑥+1 e−λ
𝑝(𝑥) = and 𝑝(𝑥 + 1) =
𝑥! (𝑥+1)!

𝑝(𝑥 + 1) λ𝑥+1 e−λ 𝑥! 𝜆 𝜆


= . 𝑥 −λ = this implies 𝑝(𝑥 + 1) = . 𝑝(𝑥)
𝑝(𝑥) (𝑥 + 1)! λ e (𝑥 + 1) (𝑥 + 1)

Which is known as the Recurrence relation for Poisson distribution.

Example-11: If the mean of a Poisson variable is 1.8, find (i) P(X > 1), (ii) P(X = 5) and (iii)
P(0 < X < 5)

Example-12: If a random variable has a Poisson distribution such that P(X = 1) = P(X = 2),
find (i) the mean of the distribution, (ii) P(X ≥ 1), and (iii) P(1 <X< 4).
Solution: 𝜆 = 2

Example-13: If X is a Poisson variate such that P(X = 0) = P(X = 1), find P(X = 0) and using
recurrence relation formula, find the probabilities at X = 1, 2, 3, 4, and 5.
So:

Example-14: If X is a Poisson variate such that 𝑃(𝑋 = 2) = 9𝑃(𝑋 = 4) + 90𝑃(𝑋 =


6). Find (i) the mean of X, (ii) the variance of X.
Solution: (i) 1 (ii) 1

Example-15: An insurance company insured 4000 people against loss of both eyes in a car
accident. Based on previous data, the rates were computed on the assumption that on the
average, 10 persons in 100000 will have car accidents each year that result in this type of

11
injury. What is the probability that more than 3 of the insured will collect on their policy in a
given year?
Solution: Using Binomial approximation to Poisson:
10
𝜆 = 𝑛. 𝑝 = 4000. ( ) = 0.4
100000
𝑃(𝑋 ≥ 4) = 1 − 𝑃(𝑋 ≤ 3) ≈ 0.000776252

Example-16: Suppose a book of 585 pages contains 43 typographical errors. If these errors
are randomly distributed throughout the book, what is the probability that 10 pages, selected
at random, will be free from errors?
Solution: This problem can be modeled using a hypergeometric distribution or approximated
with a binomial distribution due to the large number of pages.
The probability that a randomly selected page is error-free is:
P= 542/585≈ 0.9265
P(all 10 error-free)= 𝑝10 ≈ 0.464717369
Exact answer (Using Hypergeometric) ≈

X represents What?
Probability X
Distribution

12
Bernoulli X represents the outcome (Success or Failure) of a single trial of
a binary (two-outcome) experiment
Binomial Number of successes in n Bernoulli trials
Negative Binomial Total number of required Bernoulli trials to get r successes
Geometric Counts the number of required Bernoulli trials until 1st success
Hypergeometric Number of successes in a sample drawn without replacement
from a finite population
Poisson Counts the number of events in a fixed interval

Table: Summary of the Mean and Variance for some important Discrete Probability
Distributions:
Probability PMF Mean=𝝁 = 𝑽𝒂𝒓(𝑿) = 𝝈𝟐
Distribution 𝑬[𝑿] = 𝑬[𝑿𝟐 ] − (𝑬[𝑿])𝟐
Bernoulli 𝑃(𝑋 = 𝑥) = 𝑝 𝑥 (1 − 𝑝)1−𝑥 , 𝑝 𝑝(1 − 𝑝) = 𝑝𝑞
𝑥 = 0,1

Binomial 𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 , 𝑛𝑝 𝑛𝑝𝑞
𝑥
𝑥 = 0,1,2, … , 𝑛
Negative Binomial 𝑥−1 𝑟 𝑟 𝑟𝑞
𝑃(𝑋 = 𝑥) = ( ) 𝑝 (1 − 𝑝)𝑛−𝑟 ,
𝑟−1 𝑝 𝑝2
𝑥 = 𝑟, 𝑟 + 1, 𝑟 + 2, … ∞
Geometric 𝑃(𝑋 = 𝑥) = 𝑝(1 − 𝑝) 𝑥−1 , 1 q
𝑥 = 1,2, … ∞ p2
p
Hypergeometric (𝐾𝑥)(𝑁−𝐾 ) 𝐾 𝐾 𝐾 𝑁−𝑛
𝑃(𝑋 = 𝑥) = 𝑛−𝑥
; 𝑛. 𝑛. . (1 − ) .
(𝑁𝑛) 𝑁 𝑁 𝑁 𝑁−1
𝑚𝑎𝑥(0, 𝑛 − (𝑁 − 𝐾)) ≤ 𝑥
≤ 𝑚𝑖𝑛(𝑛, 𝐾)
Poisson 𝜆𝑥 e−λ λ λ
𝑝(𝑥) = ,
x!
𝑥 = 0,1,2, … ∞

The Moment Generating Function (MGF) of a random variable is a tool used to generate
moments (mean, variance, etc.) and is defined as:

𝑝. 𝑒 𝑡
𝑀𝑋 (𝑡) = 𝐸[𝑒 ] = ∑ 𝑒 𝑡𝑋 . 𝑃(𝑋 = 𝑥) =
𝑡𝑋
, for t < − ln(q)
1 − 𝑞. 𝑒 𝑡
x=1
The nth moment about the origin is obtained by:
𝒅𝒏 𝑀𝑋 (𝑡)
𝝁′𝒏 = 𝑬[𝑿𝒏 ] = | 𝒂𝒕 𝒕 = 𝟎
𝒅𝒕𝒏

2. Continuous Probability Distributions (PDFs)


In this section, we will discuss 4 types of continuous probability distributions.
Uniform distribution, Normal (Gaussian) distribution, Exponential distribution, Gamma
distribution.

For a continuous random variable X, a probability density function f(x) satisfies:

𝑏
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫ 𝑓(𝑥) 𝑑𝑥
𝑎

13
+∞
∫ 𝑓(𝑥) 𝑑𝑥 = 1 (Total probability = 1)
−∞
Important Notes:

• f(x) is NOT the probability that 𝑋 = 𝑥 (in fact, 𝑃(𝑋 = 𝑥) = 0 for continuous
variables).
• Instead, f(x) represents the density at point X=x, not the probability.

2.1 Uniform Distribution: Equal probability over an interval

1
𝑓(𝑥) = {𝑏 − 𝑎 , 𝑎 ≤ 𝑥 ≤ 𝑏
0; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

2.2 Normal Distribution: Bell-shaped distribution (used everywhere!)

Normal distribution can entirely be described by its mean (𝜇) and standard deviation (𝜎).
𝑋~𝑁(𝜇, 𝜎 2 ) i.e., a continuous random variable X is said to be normally distributed, if its
probability density function (pdf) is given by:

1 (𝑥−𝜇)2

𝑓(𝑥) = 𝑒 2𝜎2 , 𝑤ℎ𝑒𝑟𝑒 − ∞ < 𝑥 < ∞, −∞ < 𝜇 < ∞, 0 < 𝜎
𝜎√2𝜋

Here, 𝜇 𝑎𝑛𝑑 𝜎 are called the parameters of the Normal distribution.

The curve representing the normal distribution is called the normal curve.

That means, if X is a normal random variable with mean μ and standard deviation σ. Then
probability of X lying in the interval (x1, x2) is given by:
𝑥2
𝑃(𝑥1 ≤ 𝑋 ≤ 𝑥2 ) = ∫ 𝑓(𝑥)𝑑𝑥
𝑥1

14
Properties of the Normal Distribution:

(i) It is a bell-shaped symmetrical curve about the ordinate 𝑋 = 𝜇. The ordinate is


maximum at 𝑋 = 𝜇.
(ii) It is a unimodal curve and its tails extend infinitely in both the directions, i.e., the curve
is asymptotic to 𝑋 − axis in both the directions.
(iii) All the three measures of central tendency coincide, i.e.,
mean = median = mode.
(iv) The total area under the curve gives the total probability of the random variable
X takingvalues between −∞ to ∞. Mathematically,
∞ (𝑥−𝜇)2
1 −
𝑃(−∞ ≤ 𝑋 ≤ ∞) = ∫ 𝑒 2𝜎2 𝑑𝑥 = 1
−∞ 𝜎√2𝜋

The ordinate at 𝑋 = 𝜇 divides the area under the normal curve into two equal parts, i.e.
𝜇 ∞
1
∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑓(𝑥)𝑑𝑥 =
−∞ 𝜇 2
(v) The value of f(x) is always non-negative for all values of X, because the whole curve lies
above the 𝑋 axis.
(vi) The area under the normal curve is distributed as follows:
(a) The area between the ordinates at 𝜇 − 𝜎 and 𝜇 + 𝜎 is 68.27%
(b) The area between the ordinates at 𝜇 − 2𝜎 and 𝜇 + 2𝜎 is 95.45%
(c) The area between the ordinates at 𝜇 − 3𝜎 and 𝜇 + 3𝜎 is 99.74%

Parameters (Mean, Variance,…) of the Normal Distribution:

Mean for Normal distribution is given by-


∞ ∞ (𝑥−𝜇)2
1 −
𝑀𝑒𝑎𝑛 𝑜𝑓 𝑋 = 𝐸[𝑋] = ∫ 𝑥. 𝑓(𝑥)𝑑𝑥 = ∫ 𝑥. 𝑒 2𝜎2 𝑑𝑥 = 𝜇
−∞ −∞ 𝜎√2𝜋

Moreover; 𝐌𝐞𝐝𝐢𝐚𝐧 = 𝐌𝐨𝐝𝐞 = 𝜇


So, the Normal distribution is a symmetrical distribution.

Important Observation:

15
Mean of a random variable works as centre of gravity for a equally distributed area of the
surface given by 𝑓(𝑥).
That is
𝜇 ∞
∫ (𝜇 − 𝑥). 𝑓(𝑥)𝑑𝑥 = ∫ (𝑥 − 𝜇). 𝑓(𝑥)𝑑𝑥
−∞ 𝜇

−𝑥2
∞ 1
Important Integral: ∫−∞ 𝑥 2 . 𝑒 2 𝑑𝑥 = 1; this is a known standard integral.
√2𝜋

This is the second moment about the mean of the standard normal distribution, i.e., 𝑍 ∼
𝑁(0,1). That is 𝐸(𝑋 2 ) = 1.

−𝑥2
1
Let 𝜙(𝑥) = 𝑒 2
√2𝜋

Recognize that it is an even function. We use the known result that the above integral is a
classic Gaussian-type integral involving an even power of 𝑥 of the form:


2 √𝜋
∫ 𝑥 2 . 𝑒 −𝑎𝑥 𝑑𝑥 = , 𝑓𝑜𝑟 𝑎 > 0
−∞ 2. 𝑎3/2
Some other Important Integrals:
(i) Standard Gaussian-type integral:

2 𝜋
∫ 𝑒 −𝑎𝑥 𝑑𝑥 = √
−∞ 𝑎

Probability of a Normal Random Variable in an Interval:


If X is a normal random variable with mean 𝝁 and standard deviation 𝝈. Then probability of
X lying in the interval (x1 , x2 ) is given by:
𝑥2
𝑃(𝑥1 ≤ 𝑋 ≤ 𝑥2 ) = ∫ 𝑓(𝑥)𝑑𝑥
𝑥1

It looks very difficult to deal with this integration… Isn’t it?????

Now, 𝑃(𝑥1 ≤ 𝑋 ≤ 𝑥2 ) can be evaluated easily by converting a normal random X variable


into another random variable Z.
𝑋−𝜇
Let 𝑍 = be a new random variable, then -
𝜎

16
𝑋−𝜇 1
𝐸(𝑍) = 𝐸 ( ) = [𝐸(𝑋) − 𝜇] = 0
𝜎 𝜎

𝑋−𝜇 1 1
𝑉𝑎𝑟(𝑍) = 𝑉𝑎𝑟 ( ) = 2 𝑉𝑎𝑟(𝑋 − 𝜇) = 2 𝑉𝑎𝑟(𝑋) = 1
𝜎 𝜎 𝜎

• The distribution of Z is also normal.


• Thus, if X is a normal random variable with mean 𝝁 and standard deviation 𝝈 then Z is a
normal random variable with mean 0 and standard deviation 1.
• Since the parameters of the distribution of Z are fixed, it is a known distribution and is
termed standard normal distribution. Further, Z is termed as a standard normal variate.
Thus, the distribution of any normal variate X can always be transformed into the
distribution of the standard normal variate Z.
𝑥1 − 𝜇 𝑋 − 𝜇 𝑥2 − 𝜇
𝑃(𝑥1 ≤ 𝑋 ≤ 𝑥2 ) = 𝑃 ( ≤ ≤ ) = 𝑃(𝑧1 ≤ 𝑍 ≤ 𝑧2 )
𝜎 𝜎 𝜎
𝑥 −𝜇 𝑥 −𝜇
Where 𝑧1 = 1𝜎 and 𝑧2 = 2𝜎
So, the probability 𝑃(x1 ≤ 𝑋 ≤ x2 ) is equal to the area under standard normal curve
between the ordinates at 𝑍 = 𝑧1 and 𝑍 = 𝑧2 .

Case I: - If both 𝒛𝟏 and 𝒛𝟐 are positive (or both negative)

𝑃(𝑥1 ≤ 𝑋 ≤ 𝑥2 ) = 𝑃(𝑧1 ≤ 𝑍 ≤ 𝑧2 ) = 𝑃(0 ≤ 𝑍 ≤ 𝑧2 ) − 𝑃(0 ≤ 𝑍 ≤ 𝑧1 )

Case II: - If 𝒛𝟏 < 0 and 𝒛𝟐 > 0

𝑃(𝑥1 ≤ 𝑋 ≤ 𝑥2 ) = 𝑃(−𝑧1 ≤ 𝑍 ≤ 𝑧2 )
= 𝑃(−𝑧1 ≤ 𝑍 ≤ 0) + 𝑃(0 ≤ 𝑍 ≤ 𝑧2 ) = 𝑃(0 ≤ 𝑍 ≤ 𝑧1 ) + 𝑃(0 ≤ 𝑍 ≤ 𝑧2 )

17
Some other cases for 𝑷(𝑿 > 𝒙𝟏 ):
(I) If 𝒛𝟏 > 0
𝑃(𝑋 > 𝑥1 ) = 𝑃(𝑍 > 𝑧1 )
= 0.5 − 𝑃(0 ≤ 𝑍 ≤ 𝑧1 )

(II) If 𝒛𝟏 < 0
𝑃(𝑋 > 𝑥1 ) = 𝑃(𝑍 > −𝑧1 )
= 0.5 + 𝑃(−𝑧1 < 𝑍 < 0)
= 0.5 + 𝑃(0 < 𝑍 < 𝑧1 )

Standard Normal Z – Table

Uses of Normal Distribution:

1. The normal distribution can be used to approximate binomial and Poisson distributions.
2. It is used extensively in sampling theory. It helps to estimate parameters from statistics
and to find confidence limits of the parameter.

18
3. It is widely used in testing statistical hypothesis and tests of significance in which it is
always assumed that the population from which the samples have been drawn should
have normal distribution.
4. It serves as a guiding instrument in the analysis and interpretation of statistical data.
5. It can be used for smoothing and graduating a distribution which is not normal simply by
contracting a normal curve.
Example-17: What is the probability that a standard normal variate Z will be (i) greater than
1.09, (ii) less than or equal –1.65, (iii) lying between –1 and1.96, (iv) lying between 1.25 and
2.75?
Solution: (i) 0.1379 (ii) 0.0495 (iii) 0.8163 (iv) 0.1026
Example-18: If X is a normal variate with a mean of 30 and an SD of 5, find the probabilities
that (i) 26 ≤ 𝑋 ≤ 40, and (ii) 𝑋 ≥ 45?
Solution: (i) We compute the probabilities for the given intervals using the standard normal
𝑋−𝜇
distribution. The standard normal variable is given by: 𝑍 = 𝜎

𝑃(26 ≤ 𝑋 ≤ 40) = 𝑃(−0.8 ≤ 𝑍 ≤ 2) = 0.7653


(ii) 𝑃(45 ≤ 𝑋) = 𝑃(45 ≤ 𝑋 < ∞) = 𝑃(3 ≤ 𝑍) = 0.5 − 𝑃(0 < 𝑍 < 3)
= 0.5 − 0.4987 = 0.0013
Example-19: A manufacturer knows from his experience that the resistances of resistors he
produces is normal with mean = 100 ohms and SD = 2 ohms. What percentage of
resistors will have resistances between 98 ohms and 102 ohms?

Solution: Let a normal random variable 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ) representing the resistance of resistors
with mean μ=100 ohms and standard deviation 𝜎 = 2 𝑜ℎ𝑚𝑠, we need to find the percentage
of resistors with resistances between 98 ohms and 102 ohms, i.e., 𝑃(98 ≤ 𝑋 ≤ 102).
𝑋−𝜇
Converting the bounds to standard normal Z-scores using 𝑍 = :
𝜎

𝑋1 − 𝜇 98 − 100 𝑋2 − 𝜇 102 − 100


𝑍1 = = = −1, 𝑍2 = = =1
𝜎 2 𝜎 2

So:

𝑃(98 ≤ 𝑋 ≤ 102) = 𝑃(−1 ≤ 𝑍 ≤ 1) = 0.6826 = 68.26%

19
Example-20: The average seasonal rainfall in a place is 16 inches with an SD of 4 inches.
What is the probability that the rainfall in that place will be between 20 and 24 inches in a
year?
Solution: 0.4772 − 0.3413 = 0.1359
Important Note:
After dealing with previous examples, lets bag some important information.
Here it is…
1𝑥
(1) 𝑃(𝑋 < 𝑥1 ) = 𝐹(𝑥1 ) = ∫−∞ 𝑓(𝑥)𝑑𝑥

Hence, 𝑃(𝑋 < 𝑥1 ) represent the area under the curve from 𝑋 = −∞ to 𝑋 = 𝑥1 .
(2) If 𝑃(𝑋 < 𝑥1 ) < 0.5,then the point 𝑥1 lies to the left of 𝑋 = 𝜇and the corresponding value
of standard normal variate will be negative.

(3) If 𝑃(𝑋 < 𝑥1 ) > 0.5,then the point 𝑥1 lies to the right of 𝑋 = 𝜇 and the corresponding
value of standard normal variate will be positive.

Examples-21: If X is a normal variate with a mean of 120 and a standard deviation of 10,
find c such that (i) 𝑃(𝑋 > 𝑐) = 0.02, and (ii)𝑃(𝑋 < 𝑐) = 0.05.
Solution: (i) 𝑐 = 140.54 (ii) 𝑐 = 103.55
Examples-22: Assume that the mean height of Indian soldiers is 68.22 inches with a
variance of 10.8 inches. How many soldiers in a regiment of 1000 would you expect to be
over 6 feet tall?
Solution: √10.8 = 3.286, 𝑧1 = 1.15
6 feet = 72 inches
𝑝 = 0.1251
Expected number over 6 ft in 1000 soldiers = 1000 × 0.125 ≈125

20
2.3 Exponential Distribution: A continuous random variable X is said to follow the
Exponential distribution if its probability function is given by: Time between events (e.g.,
waiting time)
𝑓(𝑥) = λe−λx , 𝑥 ≥ 0, where λ is the rate of distribution.
Exponential Distribution can be used, if we wants to find the time between random events
(e.g., waiting time).

Parameters (Mean, Variance,…) of an Exponential Distribution:


1 1
Mean = λ, Variance= λ2

Note:
• When times between random events follows the Exponential distribution with rate λ,
then the total number of events in a time period of length t follows the Poisson
distribution with parameter λt.
• Exponential distribution is memoryless distribution.

Relation between Poisson & Exponential:

Poisson Exponential
➢ Number of hits to Marwadi ➢ Number of minutes between
University’s website in one two hits to Marwadi
minute. University’s website.
➢ Number of soldiers killed ➢ Number of years between
by horse-kick per year. horse-kick deaths of soldier.

➢ Number of customers ➢ Number of hours between


arriving at first floor’s Tea two customers arrive at first
Post in one hour. floor’s Tea Post.

So, Events per single unit of So, Time per single event.
time.
i.e. Exponential variate is actually time between the events which are in Poisson distribution.
i.e. you may think like that’inverse’ of Poisson)

Example-23: Let X be the Exponential random variate with probability density function
𝑥
1
𝑒 −5 ; 𝑥 > 0
𝑓(𝑥) = { 5 . Then find-
0; otherwise
(i) P(X > 5) (ii)P(3 ≤ X ≤ 6) (iii)Mean (iv)Variance.
1 1 1 1 1
Solution: (i) e ≈ 0.3679 (ii) e3/5 − e6/5 ≈ 0.2476 (iii) Mean=λ = 5 (iv) Variance = λ2 = 25

Example-24: Let X be the Exponential random variate with probability density function,
𝑐𝑒 −2𝑥 ; 𝑥 > 0
𝑓(𝑥) = {
0; otherwise
1 1
Find (i) P(X > 2)(ii)P (X < c) (iii) 𝑃 ( c < X < 𝑐)

21
1 1 1 1 1 1
Solution: 𝑐 = 2 (i) e4 (ii) 1 − e (iii) 1 − e4 − (1 − e) = e − e4

Example-25: The mileage which car owners get with a certain kind of radial tire is a random
variable having an exponential distribution with mean 4000 km. Find the probabilities that
one of these tires will last (i) at least 2000 km (ii) at most 3000 km.
Solution: X~Exponential
1
Rate = λ =
4000
1
(i) P(X ≥ 2000) = e = 0.6065 (ii) P(X≤3000)= 1 − 𝑒−0.75 ≈ 0.5276

Example-26: The daily consumption of milk in excess of 20000 gallons is approximately


exponentially distributed with mean = 3000 gallons. The city has a daily stock of 35000
gallons. What is the probability that of 2 days selected at random, the stock is insufficient for
both the days.
Solution: Hint: 𝑃(𝑋 > 15000) =

Memory-lessness of Exponential distribution:

As we know that, An Exponential variate is time between the Poisson process. i.e. In other
words, we can say Exponential variate is “inverse” of Poisson.
Let’s brush up our knowledge…
Some requirements for Poisson and Exponential variate is-

Example-27: Suppose Dr. Chetan start monitoring visit to Marwadi University website
from 9:00 am.

So, Mathematically,
• The exponential distribution has the memory-less (forgetfulness) property.

22
• This property indicates that the distribution is independent of its part, that means future
happening of an event has no relation to whether or not this event has happened in the
past.
Mathematically, this property can be expressed as:
If X is exponentially distributed and s & t are two positive real numbers then,
𝑃[(𝑋 > 𝑠 + 𝑡)|(𝑋 > 𝑠)] = 𝑃(𝑋 > 𝑡)

Is there any inequality between s and t?


Good observation
In the memoryless property of the exponential distribution:
𝑃(𝑋 > 𝑠 + 𝑡 ∣ 𝑋 > 𝑠) = 𝑃(𝑋 > 𝑡),
the only conditions are 𝑠 ≥ 0, 𝑡 ≥ 0
There is no inequality restriction between s and t (like 𝑠 < 𝑡 𝑜𝑟 𝑡 < 𝑠).
• s = the time that has already elapsed.
• t = the additional time we are considering.

So s and t are just non-negative real numbers.

For example:

• If 𝑠 = 8, 𝑡 = 3: we check “at least 11 hours given it already lasted 8”.


• If 𝑠 = 8, 𝑡 = 20: we check “at least 28 hours given it already lasted 8”.

Both are valid.

Example-28: The time (in hours) required to repair a machine is exponentially distributed
with mean = 2 hours (i) What is the probability that the repair time exceeds 2 hours? (ii)
What is the probability that a repair takes at least 11 hours given that its direction exceeds 8
hours?
Solution: Let X denotes the repair time, where 𝑋~𝐸𝑥𝑝(λ). For an exponentially distributed
random variable X with mean 𝜇 = 2 hours.
1 1
λ = μ = 2 per hour
The probability density function is 𝑓(𝑥) = λe−λx , 𝑥 ≥ 0, and the cumulative distribution
function is 𝐹(𝑥) = 1 − e−λx .
(i) 𝑃(𝑋 > 2) = 1 − 𝑃(𝑋 ≤ 𝑥) = 1 − 𝐹(𝑥) = e−λx = e−0.5∗2 = e−1 ≈ 0.3679
𝑆𝑜, the probability that the repair time exceeds (more than) 2 hours is approximately 0.3679.
(ii) We need to find the conditional probability 𝑃(𝑋 ≥ 11 ∣ 𝑋 > 8).
For an exponential distribution, the memoryless property applies, which states:
𝑃(𝑋≥𝑠+𝑡)
𝑃( 𝑋 ≥ 𝑠 + 𝑡 ∣ 𝑋 > 𝑠 ) = 𝑃(𝑋 ≥ 𝑡) OR 𝑃( 𝑋 ≥ 𝑠 + 𝑡 ∣ 𝑋 > 𝑠 ) = 𝑃(𝑋>𝑠)
Thus:
𝑃( 𝑋 ≥ 11 ∣ 𝑋 > 8 ) = 𝑃(𝑋 ≥ 3) = e−0.5∗3 = e−1.5 ≈ 0.2231

2.4 Gamma Distribution: Generalization of Exponential.

A random variable X is said to follow a Gamma distribution with shape parameter k > 0
and rate parameter λ > 0, written as:
X ∼ Gamma(k, λ)

23
λk xk−1 e−λx
𝑓(𝑥; k, λ) = for 𝑥 ∈ [0, ∞); λ, k > 0
Γk

Where the parameter k is called the shape parameter, and the parameter λ is called the rate
parameter (which is same we used in Exponential distribution) and Γk = (k − 1)!

Γk = Gamma function is defined by:


∞ ∞
𝑘−1 −𝑥
𝛤𝑘
Gamma(k) = Γk = ∫ 𝑥 𝑒 𝑑𝑥 and ∫ 𝑥 𝑘−1 𝑒 −𝜆𝑥 𝑑𝑥 =
𝜆𝑘
0 0
• Suppose that a system consisting of one original and (𝑟 − 1) spare components such that
in the case of failure of original component, one of the (𝑟 − 1) spare components can be
used.
• The process will continue till we use last component.
• When last component fails, then the whole system fails.
• Let 𝑋1 , 𝑋2, 𝑋3,…. 𝑋𝑟 be the lifetimes of the r components.
• Let each of the random variables 𝑋1 , 𝑋2, 𝑋3,…,𝑋𝑟 have the exponential distribution
with parameter, and also are probabilistically independent.
Then the lifetime (time until failure) of the entire system is given by…
𝑟

𝑇 = ∑ 𝑋𝑖
𝑖=1

And this whole system lifetime has gamma distribution, that is 𝑇~ 𝐺𝑎𝑚𝑚𝑎(𝑥; k, λ)
Thus, the sum of k independent exponential random variables has a gamma distribution.

Special Cases:
(i) If we take 𝒌 = 𝟏, then the pdf of Gamma distribution becomes the pdf of an
Exponential distribution.

(ii) The Chi-square distribution is a special case of the Gamma distribution with
∝ 1
k= 2 , λ = 2.
We know that the PMF of Gamma distribution is given by:

λk xk−1 e−λx
𝑓(𝑥; k, λ) = , 𝑥 ≥ 0; λ, k > 0
Γk

If 𝑋~𝜒𝑘2 , then the PMF of Chi-squared distribution 𝜒𝑘2 with 𝑘 degrees of freedom will be
given by:

∝ x
1 2
(2) (x) 2 −1 e− 2
∝ 1
𝑓 (𝑥; , ) = 𝑓(𝑥; ∝) = ∝ , 𝑥 ≥ 0; ∝ > 0
2 2 Γ2
{ 0, otherwise

Continuous Probability Distributions are used in Modeling measurement errors (Normal),


Time to failure (Exponential), Continuous random durations (Gamma).

24
Parameters (Mean and Variance) of Gamma Distribution
𝑘
Mean of Gamma Distribution: 𝐸(𝑋) = 𝜆
𝑘
Variance of Gamma Distribution: 𝑉𝑎𝑟(𝑋) = 𝜆2
√𝑘
Standard deviation of Gamma Distribution: 𝑆. 𝐷. = 𝜆

Example-29: Given a Gamma random variable X with 𝑟 = 3 and λ = 2. Compute E(X),


Var(X) and 𝑃(𝑋 ≤ 1.5 years).
Solution: We know that the pdf of Gamma distribution is given by:

λr 𝑥 𝑟−1 e−λx
𝑓(𝑥; r, λ) = ; 𝑥 ≥ 0; λ, r > 0
Γr

For 𝑟 = 3 and λ = 2

𝑓(𝑥; r, λ) = 4𝑥 2 𝑒 −2𝑥 ; 𝑥 ≥ 0
𝑟 3
𝑀𝑒𝑎𝑛 = 𝜆 = 2
𝑟 3
Variance = 𝜎 2 = 𝐸[𝑋 2 ] − (𝐸[𝑋])2 = λ2 = 4 = 0.75

CDF is given by:

(𝜆𝑥)2
𝐹(𝑥) = 1 − 𝑒 −𝜆𝑥 (1 + 𝜆𝑥 + )
2!

1.5 1.5

𝑃(𝑋 ≤ 1.5 years) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 4𝑥 2 𝑒 −2𝑥 𝑑𝑥 = 𝐹(1.5) ≈ 0.57681.


0 0

We know that the CDF of Gamma random variable 𝑿~𝑮𝒂𝒎𝒎𝒂(𝒓, 𝛌) is

1
𝑃(𝑋 ≤ 𝑥) = 𝛾(𝑟, 𝜆𝑥)
𝛤𝑟
𝜆𝑥
Where, 𝛤𝑟 is the Gamma function, and 𝛾(𝑟, 𝜆𝑥) = ∫0 𝑥 𝑟−1 𝑒 −𝑥 𝑑𝑥 is the lower incomplete
Gamma function.

1.5 1.5 1.5


1 1
𝑃(𝑋 ≤ 1.5) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 4𝑥 𝑟−1 𝑒 −2𝑥 𝑑𝑥 = ∫ 4𝑥 2 𝑒 −2𝑥 𝑑𝑥 = 𝛾(3,3) = 𝛾(3,3)
𝛤3 2
0 0 0

For a Gamma distribution with integer r, the CDF can be expressed as:

25
r−1
−λx
(𝜆𝑥)𝑘
𝑃(𝑋 ≤ 𝑥) = 1 − e ∑
𝑘!
k=0

2
−3
(3)𝑘 8.5
𝑃(𝑋 ≤ 1.5) = 1 − e ∑ = 1 − e−3 (1 + 3 + 4.5) = 1 − 3 ≈ 0.5768
𝑘! e
k=0

Example-30: In a certain city, the daily consumption of electric power in millions of kilowatt
hours can be treated as a random variable having gamma distribution with parameters λ =
1/2 and r = 3. If the power plant of this city has a daily capacity of 12 million kilowatt-hours,
what is the probability that this power supply will be inadequate on any given day?

Solution: We know that the CDF of Gamma random variable 𝑿~𝑮𝒂𝒎𝒎𝒂(𝒓, 𝛌) is

1
𝑃(𝑋 ≤ 𝑥) = 𝛾(𝑟, 𝜆𝑥)
𝛤𝑟
r−1
−λx
(𝜆𝑥)𝑘
𝑃(𝑋 ≤ 𝑥) = 1 − e ∑
𝑘!
k=0

Table: Summary of the Mean and Variance for some important Continuous Probability
Distributions:
Probability PDF Mean=𝝁 = 𝑬[𝑿] 𝐕𝐚𝐫𝐢𝐚𝐧𝐜𝐞 = 𝑽𝒂𝒓(𝑿) S.D.
𝟐 𝟐]
= 𝝈 = 𝑬[𝑿 − (𝑬[𝑿])𝟐 =𝝈
Distribution
1
Uniform 𝑓(𝑥) = 𝑏−𝑎 , 𝑎+𝑏 (𝑏 − 𝑎)2
2 12
𝑎≤𝑥≤𝑏

Normal 𝑓(𝑥) 𝜇 𝜎2
1 (𝑥−𝜇)2
(Gaussian) −
= 𝑒 2𝜎2 ,
𝑋~𝑁(𝜇, 𝜎2 ) 𝜎√2𝜋

𝑥 ∈ 𝑅, 𝜇 ∈ 𝑅,

0<𝜎

Exponential 𝑓(𝑥) = λe−λx , 𝑥 > 0 1 1


distribution 𝜆 𝜆2
Gamma 𝑓(𝑥; k, λ) = 𝑘 𝑘
𝜆𝑘 𝑥 𝑘−1 𝑒 −𝜆𝑥 𝜆 𝜆2
,
𝛤𝑘

𝑥 ≥ 0; λ, k > 0

26
Probability Mass Function (PMF):
Example-31: Check whether function defined below is probability mass function or not.

X 0 1 2 3

f(X=x) 0.1 0.4 0.2 0.3

Example-32: A discrete random variable X has following probability distribution.

X 0 1 2 3 4 5

f(X=x) 0 k 0.2 2k 0.3 2k

Then find (1) k (2) P(X<3) (3) P(X≥3) (4) P(2≤X≤5).


Example-33: From a lot of 10 items 3 items are defective. A sample of 4 items is chosen
randomly. If discrete random variable X represents the number of defective items in the
sample then find the probability distribution.

Solution: We are given:

• Total items = 10
• Number of defective items = 3
• Number of non-defective items = 7
• Sample size = 4
• Let X be the number of defective items in the sample.

So, X can take values: 0, 1, 2, 3


(Since we are choosing 4 items from 10 and only 3 are defective, we cannot get more than 3
defectives.)

Step 1: Total number of ways to choose 4 items from 10:


10
C4=210
Step 2: Find probabilities for each value of X
We use hypergeometric probability:
(𝐾𝑥)(𝑁−𝐾
𝑛−𝑥
)
𝑃(𝑋 = 𝑥) = ; 𝑚𝑎𝑥(0, 𝑛 − (𝑁 − 𝐾)) ≤ 𝑥 ≤ 𝑚𝑖𝑛(𝑛, 𝐾)
(𝑁𝑛)
Where:
N: total number of items (population size)
K: number of successes in the population
n: number of items drawn (sample size)
X: number of successes in the sample

27
(𝑥3)(4−𝑥
7
)
𝑃(𝑋 = 𝑥) = ; 𝑥 = 0,1,2,3
(10
4
)
So, Probability Distribution is given by:
X 0 1 2 3
P(X) 1/6 1/2 3/10 1/30

Cumulative distribution function (CDF) if X is a discrete random variable:


How to find the CDF if PMF is given?
We know that 𝑃(𝑋 = 𝑥𝑖 ) = 𝑝(𝑥𝑖 )
Let X be a discrete random variable which takes the values 𝑥1 , 𝑥2 , 𝑥3 , …. s.t. 𝑥1 < 𝑥2 < 𝑥3 <
…., then, the cumulative distribution function F(x) is defined as
𝑛

𝐹(𝑥𝑛 ) = 𝑃(𝑋 ≤ 𝑥𝑛 ) = ∑ 𝑝(𝑥𝑖 )


𝑖=1
= 𝑝(𝑥1 ) + 𝑝(𝑥2 ) + ⋯ + 𝑝(𝑥𝑛 )

CDF of a Binomial Variate


Let X~𝐵𝑖𝑛(𝑛, 𝑝). Its PMF (probability mass function) is:
𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 , 𝑥 = 0,1,2, … , 𝑛
𝑥
𝑖=𝑚 𝑚
𝑛
FX (𝑥𝑚 ) = 𝑃(𝑋 ≤ 𝑥𝑚 ) = ∑ 𝑃(𝑋 = 𝑥) = ∑ ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 , 𝑚 = 0,1,2, … , 𝑛
𝑥
𝑖=0 𝑖=0

Properties of CDF
(1) 𝐹(𝑥𝑛 ) = 𝑝(𝑥1 ) + 𝑝(𝑥2 ) + ⋯ + 𝑝(𝑥𝑛 )
(2) ∑𝑛𝑖=1 𝑝(𝑥𝑖 ) = 1
(3) 0 ≤ 𝐹(𝑥𝑖 ) ≤ 1, 𝑖 = 1,2, … , 𝑛
(4)P(𝑎 < 𝑋 ≤ 𝑏) = 𝐹(𝑏) − 𝐹(𝑎)

Example-34: A discrete random variable X takes the values –3, –2, –1, 0, 1, 2, 3, such that
P(X = 0) = P(X > 0) = P(X < 0) and P(X = –3) = P(X = –2) = P(X = –1) = P(X = 1) = P(X =
2) = P(X = 3). Obtain the probability distribution and the cumulative distribution function of
X.
Solution: Given conditions:

1. 𝑃(𝑋 = 0) = 𝑃(𝑋 > 0) = 𝑃(𝑋 < 0).


2. All nonzero values have the same probability:

𝑃(𝑋 = −3) = 𝑃(𝑋 = −2) = 𝑃(𝑋 = −1) = 𝑃(𝑋 = 1) = 𝑃(𝑋 = 2) = 𝑃(𝑋 = 3).

Let 𝑃(𝑋 = −3) = 𝑃(𝑋 = −2) = 𝑃(𝑋 = −1) = 𝑃(𝑋 = 1) = 𝑃(𝑋 = 2) =


𝑃(𝑋 = 3) = 𝑎.

And 𝑃(𝑋 = 0) = 𝑏

X -3 -2 -1 0 1 2 3
𝑃(𝑋 = 𝑥) 𝑎 a a b a a a

28
Then form the given condition-1:
𝑏 = 3𝑎 = 3𝑎.
Also we know that ‘Total probability’=1, this gives
6𝑎 + 𝑏 = 1
1 1
6𝑎 + 3𝑎 = 9𝑎 = 1 ⇒ 𝑎 = 9 and 𝑏 = 3𝑎 = 3.

So, the Probability distribution is given by -


X -3 -2 -1 0 1 2 3
𝑃(𝑋 = 𝑥) 1 1 1 1 1 1 1
9 9 9 3 9 9 9

The Cumulative distribution function (CDF) is given by F(X)=𝑃(𝑋 ≤ 𝑥),

X -3 -2 -1 0 1 2 3
𝑃𝑀𝐹 = 𝑃(𝑋 = 𝑥) 1 1 1 1 1 1 1
= 𝑝(𝑥) 9 9 9 3 9 9 9
CDF 1 2 1 2 7 8 1
= 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) 9 9 3 3 9 9
that is:
0, 𝑥 < −3
1
, −3 ≤ 𝑥 < −2
9
2
, −2 ≤ 𝑥 < −1
9
1
, −1 ≤ 𝑥 < 0
𝐹(𝑥) = 32
,0 ≤ 𝑥 < 1
3
7
,1 ≤ 𝑥 < 2
9
8
,2 ≤ 𝑥 < 3
9
{ 1, 3 ≤ 𝑥
Note: CDF is always non-decreasing and ends at 1.

Mean or Arithmetic mean or Mathematical expectation:


Let p(x) be the probability mass function then the mean or average value (μ) of a discrete
random variable X is called as expectation and is denoted by E(X).

𝜇 = 𝐸(𝑋) = ∑ 𝑥𝑖 . 𝑝(𝑥𝑖 ) = ∑ 𝑥. 𝑝(𝑥)


𝑖=1

Note: If ∅(𝑥) is a function of discrete random variable X then the expectation of ∅(𝑥) is
given by

𝐸(∅(𝑥)) = ∑ ∅(𝑥𝑖 ). 𝑝(𝑥𝑖 ) = ∑ ∅(𝑥). 𝑝(𝑥)


𝑖=1

Properties of Mean:

29
(1) 𝐸(𝑘) = 𝑘, where 𝑘 is a constant.
(2) 𝐸(𝑘𝑋) = 𝑘𝐸(𝑋)
(3) 𝐸(𝑎𝑋 ± 𝑏) = 𝑎𝐸(𝑋) ± 𝑏
(4) 𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌) provided 𝐸(𝑋) and 𝐸(𝑌) exists
(5) 𝐸(𝑋𝑌) = 𝐸(𝑋)𝐸(𝑌) if X and Y are independent.

Variance:
The variance of the probability distribution of a discrete random variable X is given by
Var(X) = 𝜎 2 = 𝐸[(𝑋 − 𝜇)2 ]
= 𝐸(𝑋 2 − 2𝑋𝜇 + 𝜇 2 ) = 𝐸(𝑋 2 ) − 2𝜇𝐸(𝑋) + 𝜇 2 = 𝐸(𝑋 2 ) − 2𝜇𝜇 + 𝜇 2
= 𝐸(𝑋 2 ) − 𝜇 2 = 𝐸[𝑋 2 ] − (𝐸[𝑋])2

Properties of Variance:
(1) 𝑉𝑎𝑟(𝑘) = 0
(2) 𝑉𝑎𝑟(𝑋 + 𝑘) = 𝑉𝑎𝑟(𝑋)
(3) 𝑉𝑎𝑟(𝑘𝑋) = 𝑘 2 𝑉𝑎𝑟(𝑋)
(4) 𝑉𝑎𝑟(𝑎𝑋 ± 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋)
(5) 𝑉𝑎𝑟(𝑋 + 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) + 2. 𝐶𝑜𝑣(𝑋, 𝑌)
If X and Y are independent random variables, then 𝐶𝑜𝑣(𝑋, 𝑌) = 0
(6) 𝑉𝑎𝑟(𝑋 − 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) − 2. 𝐶𝑜𝑣(𝑋, 𝑌)
(7) 𝑉𝑎𝑟(𝑎𝑋 + 𝑏𝑌) = 𝑎2 𝑉𝑎𝑟(𝑋) + 𝑏 2 𝑉𝑎𝑟(𝑌) + 2𝑎𝑏. 𝐶𝑜𝑣(𝑋, 𝑌)
Where, 𝐶𝑜𝑣(𝑋, 𝑌) represents the covariance between random variables X and Y.
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )]
Or, equivalently:
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[𝑋𝑌] − 𝐸[𝑋]. 𝐸[𝑌]
Where
∑ 𝑥. 𝑃(𝑥) ; 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝐸[𝑋] =
∫ 𝑥. 𝑓𝑋 (𝑥)𝑑𝑥 ; 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
{
And
∑ ∑ 𝑥𝑦. 𝑃(𝑥, 𝑦); 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝐸[𝑋𝑌] = 𝑥 𝑦

∫ ∫ 𝑥𝑦. 𝑓𝑋,𝑌 (𝑥, 𝑦)𝑑𝑥𝑑𝑦 ; 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒


{

Example-35: Find the covariance between X and Y (Discrete Case):

Suppose joint probabilities:

X Y P(X,Y)
1 2 0.2
1 4 0.3
2 2 0.1
2 4 0.4
Compute 𝐸[𝑋], 𝐸[𝑌], 𝐸[𝑋𝑌]
𝐸[𝑋] = 1(0.2 + 0.3) + 2(0.1 + 0.4) = 1.5
𝐸[𝑌] = 2(0.2 + 0.1) + 4(0.3 + 0.4) = 3.4
30
𝐸[𝑋𝑌] = (1)(2)(0.2) + (1)(4)(0.3) + (2)(2)(0.1) + (2)(4)(0.4) = 5.2
Hence,
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )] = 𝐸[𝑋𝑌] − 𝐸[𝑋]. 𝐸[𝑌] = 5.2 − (1.5)(3.4) = 0.1

Example-36: Find the covariance between X and Y (Continuous Case):


Let X and Y be continuous random variables with the joint probability density function
(pdf):
𝑓(𝑥, 𝑦) = 8𝑥𝑦 𝑓𝑜𝑟 0 < 𝑥 < 1,0 < 𝑦 < 1
First we’ll verify that it is a valid joint pdf
1 1

∫ ∫ 8𝑥𝑦𝑑𝑥𝑑𝑦 = 2
0 0
So, it is not a valid pdf because the total area is not 1. We divide it by 2.
The corrected joint pdf will be
𝑓(𝑥, 𝑦) = 4𝑥𝑦 𝑓𝑜𝑟 0 < 𝑥 < 1,0 < 𝑦 < 1
1 1 1 1
2
𝐸[𝑋] = ∫ ∫ 𝑥. 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = ∫ ∫ 𝑥. 4𝑥𝑦𝑑𝑥𝑑𝑦 =
3
0 0 0 0
1 1
2
𝐸[𝑌] = ∫ ∫ 𝑦. 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 =
3
0 0
1 1
4
𝐸[𝑋𝑌] = ∫ ∫ 𝑥𝑦. 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 =
9
0 0
Hence,
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )] = 𝐸[𝑋𝑌] − 𝐸[𝑋]. 𝐸[𝑌] = 0

Standard deviation: SD = √Var(X) = 𝜎


Example-37: The probability distribution of a random variable X is given below. Find
(i) E(X), (ii) Var(X), (iii) E(2X–3), and (iv) Var (2X – 3).

X -2 -1 0 1 2

P(X=x) 0.2 0.1 0.3 0.3 0.1

Hint: (i) 0 (ii) 1.6 (iii) -3 (iv) 6.4

Example-38: The monthly demand of a product is known to have the following probability
distribution. Find the expected demand for the product. Also, compute the variance.

Demand
1 2 3 4 5 6 7 8
(x)

Probability
0.08 0.12 0.19 0.24 0.16 0.1 0.07 0.04
p(x)

31
Hint: E[X]= 4.06, Var(X) = 3.2164

Example-39: Let X be a random variable with E(X) = 10 and Var (X) = 25. Find the positive
values of a and b such that Y = aX – b has an expectation of 0 and a variance of 1.
Hint: a=1/5, b= 2

Probability Density Function (PDF):


Let X be a continuous random variable, then function f(X) of random variable X is called
probability density function if
(1)0 ≤ 𝑓(𝑥) ≤ 1, − ∞ < 𝑥 < ∞

(2)∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
𝑏
Note:P(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥
Cumulative Distribution Function (CDF) if X is a continuous random variable:
Let X be a continuous random variable then the cumulative distribution function F(x) is
defined as
𝑥

𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∫ 𝑓(𝑡)𝑑𝑡


−∞
Properties of CDF
−∞
(1) 𝐹(−∞) = ∫−∞ 𝑓(𝑡)𝑑𝑡 = 0

(2) 𝐹(∞) = ∫−∞ 𝑓(𝑡)𝑑𝑡 = 1

(3) 0 ≤ 𝐹(𝑥) ≤ 1, −∞<𝑥 <∞


(4)P(𝑎 < 𝑋 ≤ 𝑏) = 𝐹(𝑏) − 𝐹(𝑎)

Mean of a Continuous Random Variable:


Let p(x) be the probability density function then the mean or average value (μ) of a
continuous random variable X is called as expectation and is denoted by E(X).

𝜇 = 𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞

Note: If ∅(𝑥) is a function of continuous random variable X then the expectation of ∅(𝑥) is
given by

𝐸(∅(𝑥)) = ∫ ∅(𝑥)𝑓(𝑥)𝑑𝑥
−∞

Variance of a Continuous Random Variable:


The variance of the probability distribution of a continuous random variable X is given by
Var(X) = 𝜎 2 = 𝐸(𝑋 − 𝜇)2
= 𝐸(𝑋 2 ) − 𝜇 2

= ∫−∞ 𝑥 2 𝑓(𝑥) 𝑑𝑥 − 𝜇 2

32
Standard deviation: SD = √Var(X) = 𝜎

Example-40: Check whether the following function f(x) is a probability density function or
not. Also, find the probability that the variable having this density falls in the interval [1, 2].
𝑒 −𝑥 , 𝑥 ≥ 0
𝑓(𝑥) = {
0, 𝑥 < 0
1 1
Hint: 𝑒 − 𝑒 2 ≈0.233

Example-41: For the following PDF of a R.V. X, find (i) the value of k and the probabilities
that a random variable having this probability density will take on a value (ii) between 0.1
and 0.2, and, (iii) greater than 0.5.
𝑘(1 − 𝑥 2 ), 0 < 𝑥 < 1
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
3 5
Hint: (i) k=2 (ii) 0.1465 (iii) 16 = 0.3125

Example-42: A continuous random variable X has following PDF 𝑓(𝑥). Find a and b such
that (i) P(X ≤ a) = P(X > a) and (ii) P(X > b) = 0.05, where, 0 < 𝑎, 𝑏 < 1.
3𝑥 2 , 0 < 𝑥 < 1
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Hint: (i) 0.7937 (ii) 0.983
Example-43: Find the constant k such that the following function f(x) is a PDF. Also, find
the cumulative distribution function F(x) and P(1 < X ≤ 2).
𝑘𝑥 2 , 0 < 𝑥 < 3
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
1 𝑥3 7
Solution: (i) 9 (ii) 27 (iii) 27

Example-44: If the density function of a random variable X is given as below then find (i)
value of k, (ii) Expectation of X, (iii) Variance, (iv) SD.
𝑘𝑥(1 − 𝑥), 0 < 𝑥 < 1
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
1 1 1
Solution: (i) 6 (ii) 2 (iii) 20 (iii)
√20

Example-45: A continuous random variable X has the pdf defined by


𝑓(𝑥) = 𝐴 + 𝐵𝑥, 0 ≤ 𝑥 ≤ 1.
If the mean of the distribution is 1/3, find A and B.
Solution: 𝐴 = 2, 𝐵 = −2
Example-46: If the probability density function (PDF) of X is given as below then find the
expected value of ∅(𝑥) = 𝑥 2 − 5𝑥 + 3.

33
𝑥
, 0<𝑥≤1
2
1
, 1<𝑥≤2
𝑓(𝑥) 2
3−𝑥
, 2<𝑥≤3
2
{0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Hint for Solution:
3
−11
𝐸[∅(𝑥)] = ∫ ∅(𝑥). 𝑓(𝑥)𝑑𝑥 =
0 6
2nd Method: 𝐸[∅(𝑥)] = 𝐸[𝑋 2 ] − 5. 𝐸[𝑋] + 3
3 8
𝐸[𝑋] = , 𝐸[𝑋 2 ] =
2 3

Moments of Random Variables:


A moment of a random variable is a quantitative measure related to the shape and
characteristics (like center, spread, skewness, etc.) of its probability distribution. Moments
give us information about characteristics like central tendency, spread, variability,
skewness, and kurtosis.
They are especially useful in probability, statistics, and machine learning.

Definition of the rth Moment:

For a random variable X, the rth moment about the origin (also called the Raw moment) is:

𝜇𝑟 ′ = 𝐸[𝑋 𝑟 ]

For a random variable X, the rth moment about the mean (also called the Central moment) is:

𝜇𝑟 = 𝐸[(𝑋 − 𝜇)𝑟 ], where 𝜇 = 𝐸[𝑋]

Moment Type Expression Interpretation


1st moment about origin 𝐸[𝑋] Mean (center of
OR distribution).
Raw Moment Measures the central
location.
2nd central moment 𝐸[(𝑋 − 𝜇)2 ] Also known as Variance and
Measures dispersion.
3rd central moment 𝐸[(𝑋 − 𝜇)3 ] Skewness (Measures
asymmetry)
4nth central moment 𝐸[(𝑋 − 𝜇)4 ] Kurtosis (Measures
tailedness (tail heaviness) or
peakedness)

Kurtosis is a statistical measure that tells us how sharp or flat the peak of a distribution is,
compared to a normal (bell-shaped) distribution.
Kurtosis helps you understand: (i) Is the data peaked (with extreme values)? (ii) Or is it flat
and spread out?

34
Table: Types of Kurtosis
Type Description
Mesokurtic Normal kurtosis (like a bell curve), kurtosis=3

Leptokurtic High kurtosis → sharp peak, fat tails (outliers), kurtosis>3

Platykurtic Low kurtosis → flat peak, thin tails, kurtosis<3

Formula for Kurtosis

(i) Kurtosis for a sample:


n
n(n + 1) xi − x̅ 4 3(n − 1)2
Kurtosis = ∑( ) −
(n − 1)(n − 2)(n − 3) s (n − 2)(n − 3)
i=1

Where:

• n = number of data points


• xi = each data value
• x̅ = sample mean
• s = sample standard deviation
1
• s2 = ∑i(xi − x̅)2
n−1

xi −x̅ 4
The term ∑ni=1 ( ) captures the fourth moment about the mean.
s

n(n+1)
The adjustment factor like (n−1)(n−2)(n−3) correct for bias in small samples.

Subtracting the second term gives us excess kurtosis (Kurtosis relative to normal
distribution).
(1) Kurtosis for a Population: For a dataset with mean μ and standard deviation σ,
n
1 xi − μ 4
Kurtosis = ∑ ( )
n σ
i=1
Where
• μ = population mean
• σ = population standard deviation
Note: Excess Kurtosis = Kurtosis – 3, because the normal distribution has kurtosis = 3

Functions of Random Variables:


If 𝑋 is a random variable and 𝑔(𝑋) is a function of 𝑋, then 𝑌 = 𝑔(𝑋) is also a random
variable. The distribution and moments of Y depend on how g transforms X.
i. Distribution of a Function:

To find the distribution of 𝑌 = 𝑔(𝑋):

35
• If X is discrete, find 𝑃(𝑌 = 𝑦) = 𝑃(𝑋 = 𝑥 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑔(𝑥) = 𝑦).
• If X is continuous, use transformation techniques (e.g., change of variables,
Jacobians for multiple variables).

ii. Expectation of a Function OR Expected Value of a Function:


𝐸[𝑔(𝑋)] = ∑𝑥 𝑔(𝑋)𝑃(𝑋 = 𝑥) (Discrete)

𝐸[𝑔(𝑋)] = ∫−∞ 𝑔(𝑋)𝑓(𝑥) 𝑑𝑥 (Continuous)

iii. Variance of a Function:


𝑉𝑎𝑟(𝑔(𝑋)) = 𝐸[(𝑔(𝑋) − 𝐸[𝑔(𝑋)])2 ]
Examples-47:

1. If 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ), then the 2nd moment about the mean, that is variance is σ2 .
2. If 𝑌 = 𝑋 2 , and 𝑋 ∼ 𝑁(0,1), then:
• 𝐸[𝑌] = 𝐸[𝑋 2 ] = 1
• Y follows a chi-squared distribution with 1 degree of freedom.

Law of Large Numbers (LLN)

The Law of Large Numbers is a fundamental theorem in probability that describes the result
of performing the same experiment for a large number of times.

Definition:
The Law of Large Numbers states that as the number of trials (or observations) increases,
the sample mean of the observed outcomes gets closer to the expected (theoretical) mean.
i.e. when a random experiment is repeated many times, the average result will tend to be
close to the expected value.

Mathematical Statement:
Let 𝑋1, 𝑋2,…, 𝑋𝑛 be independent and identically distributed (i.i.d.) random variables with
mean μ=E[𝑋𝑖 ]. Then:
1
𝑋̅𝑛 =𝑛 ∑𝑛i=1 𝑋𝑖 → μ as n→∞
That is, the sample mean 𝐗 ̅ 𝐧 converges to the population mean μ as the number of
observations n increases.

Markov Inequality:
For any random variable 𝑋 ≥ 0 and 𝜀 > 0: 𝑃(𝑋 > 𝜀) ≤ 𝐸[𝑋]/𝜀

Chebyshev's Inequality:
If 𝐸[𝑋] = 𝜇, and 𝑉𝑎𝑟(𝑋) = 𝜎² < ∞, then: 𝑃(|𝑋 − 𝜇| > 𝜀) ≤ 𝑉𝑎𝑟(𝑋)/𝜀²

Example-48: Let 𝑋₁, . . . , 𝑋₁₀₀₀ ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(3). Find 𝑃(|𝑋̅₁₀₀₀ − 3| > 0.2) using Chebyshev's
inequality.

Solution: Var of each Xi is 3. For the sample mean 𝑋̅1000,

𝑉𝑎𝑟 (𝑋𝑖 ) 𝜎2 3
Var (𝑋̅1000) = = = = 0.003.
𝑛 𝑛 1000

36
Chebyshev's inequality gives, for 𝜀 = 0.2,

𝑉𝑎𝑟 (𝑋̅1000 ) 3/1000


𝑃(∣ 𝑋̅1000 − 3 ∣> 0.2) ≤ = = 0.075.
0.22 0.22

Example-49: Tossing a Fair Coin

Let’s define a random variable 𝑋𝑖 for the outcome of the 𝑖 𝑡ℎ toss:

1, 𝑖𝑓 𝐻𝑒𝑎𝑑
𝑋𝑖 = {
0, 𝑖𝑓 𝑇𝑎𝑖𝑙

The expected value of 𝑋𝑖 is 𝐸[𝑋𝑖 ] = 0.5

Let’s toss the coin n times and compute the sample mean:

Tosses n Number of Heads ̅𝒏


Sample Mean 𝑿
10 6 0.6
100 49 0.49
500 248 0.496
1000 507 0.507
5000 2506 0.501
We can observe that as the number of tosses increases, the sample mean 𝑋̅𝑛 approaches the
theoretical mean 𝐸(𝑋) = 0.5.

Central Limit Theorem (CLT)

The Central Limit Theorem (CLT) is one of the cornerstone results in probability and
statistics. It explains why many distributions tend to look "normal" (bell-shaped) when you
average or sum a large number of random variables.
Statement of the Theorem
Informal Statement:

If you take a large number of independent and identically distributed (i.i.d.) random
variables, each with a finite mean and variance, then the distribution of their properly
normalized sum (or average) will tend toward a normal (Gaussian) distribution, regardless of
the original distribution of the variables.

Formal Statement (for i.i.d. random variables):


Let 𝑋1, 𝑋2,…, 𝑋𝑛 be i.i.d. random variables with:
• Mean: 𝜇 = 𝐸[𝑋𝑖 ]
• Variance: 𝜎 2 = Var(𝑋𝑖 ) (which is finite)
Define the sample mean:
𝑛
1
̅̅̅
𝑋𝑛 = ∑ 𝑋𝑖
𝑛
𝑖=1
Take:

37
̅̅̅
𝑋𝑛 − μ
𝑍𝑛 = σ ; 𝑡ℎ𝑒𝑛 𝑍𝑛 ~𝑁(0,1) 𝑎𝑠 𝑛 → ∞
√𝑛
This means the distribution of 𝑍𝑛 approaches the standard normal distribution as 𝑛 → ∞.

𝑋 −μ 𝑋 −E[X] ̅𝑋̅̅𝑛̅−μ
Note: Here in place of taking the transformation 𝑍 = = we are taking 𝑍𝑛 = σ , It
σ √Var(X)
√𝑛
seems that we are taking different transformation. But infact we are using the same transformation.
To see this, Let us consider
1
̅̅̅
𝑋𝑛 = (𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )
𝑛
Then:

1
̅ n ] = E [ (𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )]
E[X
𝑛
1 1
= (𝐸[𝑋1 ] + 𝐸[𝑋2 ] + ⋯ + 𝐸[𝑋𝑛 ]) = (μ + μ + ⋯ + μ) = μ
𝑛 𝑛

1 1
̅ n ) = Var ( (𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )) = (𝑉𝑎𝑟(𝑋1 ) + 𝑉𝑎𝑟(𝑋2 ) + ⋯ + 𝑉𝑎𝑟(𝑋𝑛 ))
Var(X
𝑛 𝑛2
1 𝜎2
= 2 𝑛. 𝜎 2 =
𝑛 𝑛
̅n) = σ
S.D. (X
√𝑛
̅𝑋̅̅𝑛̅−E[X̅ n ] ̅𝑋̅̅𝑛̅−μ
So: 𝑍𝑛 = 𝑍𝑛 = = σ
√Var(X̅ n )
√𝑛

Example-50: Rolling a Die


Step 1: Population Information
• Random variable X: outcome of one die roll = {1, 2, 3, 4, 5, 6}
• Mean of population:
6
1 1+2+3+4+5+6
𝜇 = ∑ 𝑋𝑖 = = 3.5
6 6
𝑖=1
• Variance:
6 6
1 1
σ = ∑(xi − μ)2 = ∑(6.25 + 2.25 + 0.25 + 0.25 + 2.25 + 6.25) ≈ 2.917
2
6 6
i=1 i=1

Step 2: Take Many Random Samples


• Take 1000 samples, each consisting of n = 30 die rolls.
• Compute the sample mean 𝑿 ̅ for each sample.
We would get 1000 sample means.

Step 3: Plot the Distribution of Sample Means


If we create a histogram of these 1000 sample means, we will observe:
• It looks approximately normal (bell-shaped).
• The mean of the sampling distribution is approximately 𝜇 = 3.5
• The standard deviation of the sampling distribution is approximately:
σ √2.917 1.708
= ≈ ≈ 0.312
√𝑛 √30 5.477

38
So, we can say that even though the original population (die roll outcomes) is uniform (not
normal), the distribution of sample means is approximately normal for sufficiently large n
(=1000), as guaranteed by the Central Limit Theorem.

Example-51: A company finds that the time to process a customer request is normally
distributed with mean μ = 20 minutes and standard deviation σ = 5 minutes. Find the
probability that the average processing time of 36 customers is:
(i) Less than 19 minutes (ii) Between 19 and 21 minutes
(𝑋̅𝑛 − 𝜇)
Hint: Formula: 𝑍 = and then see the value standard in the normal table 𝛷(𝑍).
𝜎 / √𝑛
(𝑋̅𝑛 − 20)
Solution: 𝑍 = 5/6
(19 − 20) −6
(i) For 𝑋 = 19, 𝑍 = =
5/6 5
So, 𝑃(𝑋 < 19) = 𝑃(𝑍 < −1.2) = 𝑃(−∞ < 𝑍 < −1.2) = 𝑃(1.2 < 𝑍 < ∞)
= 𝑃(0 < 𝑍 < ∞) − 𝑃(0 < 𝑍 < 1.2) = 0.5 − 0.3849 = 0.1151

(21 − 20)
(ii) 𝑃(19 < 𝑋 < 21) = 𝑃 (−1.2 < 𝑍 < 5 ) = 𝑃(−1.2 < 𝑍 < 1.2)
6
= 2. 𝑃(0 < 𝑍 < 1.2) = 2 ∗ 0.3849 = 0.7698

Appendix

39
1. Standard Normal Z - Table

40
−𝑥2
∞ 1
2. Important Integral: ∫−∞ 𝑥 2 . 𝑒 2 𝑑𝑥 = 1; this is a known standard integral.
√2𝜋

Let,
1 −𝑥2
𝜙(𝑥) = 𝑒 2
√2𝜋

Recognize that it is an even function. We use the known result that the above integral is a
classic Gaussian-type integral involving an even power of 𝑥 of the form:


2 √𝜋
∫ 𝑥 2 . 𝑒 −𝑎𝑥 𝑑𝑥 = , 𝑓𝑜𝑟 𝑎 > 0
−∞ 2. 𝑎3/2

Derivation Sketch: We know that the Standard Gaussian integral is given by:

2 𝜋
∫ 𝑒 −𝑎𝑥 𝑑𝑥 = √
−∞ 𝑎
Taking the derivative w.r.t. 𝑎 under the integral sign:
∞ ∞ ∞
𝑑 2 𝑑 −𝑎𝑥 2 2
(∫ 𝑒 −𝑎𝑥 𝑑𝑥) = ∫ 𝑒 𝑑𝑥 = ∫ −𝑥 2 𝑒 −𝑎𝑥 𝑑𝑥
𝑑𝑎 −∞ −∞ 𝑑𝑎 −∞
So:

2 𝑑 𝜋
∫ 𝑥 2 . 𝑒 −𝑎𝑥 𝑑𝑥 = − (√ )
−∞ 𝑑𝑎 𝑎
After differentiating, we get:

2 √𝜋
∫ 𝑥 2 . 𝑒 −𝑎𝑥 𝑑𝑥 =
−∞ 2. 𝑎3/2

Some other Important Integrals:


∞ 2
(i) ∫−∞ 𝑥 2𝑛 . 𝑒 −𝑎𝑥 𝑑𝑥 ; 𝑎 > 0
∞ (2𝑛 − 1)‼ 𝜋
2
∫ 𝑥 2𝑛 . 𝑒 −𝑎𝑥 𝑑𝑥 = .√
−∞ (2𝑎)𝑛 𝑎
Where (2𝑛 − 1)‼ is the double factorial:
(2𝑛 − 1)‼ = (2𝑛 − 1)(2𝑛 − 3) … (3)(1)

3. Gamma function is defined as:


∞ ∞
𝑘−1 −𝑥
𝛤𝑘
Gamma(k) = Γk = ∫ 𝑥 𝑒 𝑑𝑥 and ∫ 𝑥 𝑘−1 𝑒 −𝜆𝑥 𝑑𝑥 = ; 𝑤ℎ𝑒𝑟𝑒 Γk = (𝑘 − 1)!
𝜆𝑘
0 0
(Doubt of Syllabus): Ex- 29, 30, 33,

41

You might also like