Notes Unit- 2_Random Variable Probability Distribution
Notes Unit- 2_Random Variable Probability Distribution
Random Variable
A random variable (RV) is a variable that takes numerical values based on the outcomes of
a random experiment. Generally it is denoted by X.
Random
variable
Discrete Continuous
random random
variable variable
Random
Variables
Discrete Continuous
RV RV
1
5. Distance travelled by a car in an hour → e.g., 60.5 km
Probability Distribution:
There will always be a risk of drawing false conclusions or making false predictions.
• We need to be sufficiently confident before taking any decision by setting confidence
levels.
• Often set at 90 percent, 95 percent or 99 percent.
2
1. Probability Distributions:
• Many probability distributions can be defined by factors such as the mean and
standard deviation of the data.
• Each probability distribution has a formula.
• There are different shapes, models and classifications of probability distributions.
They are often classified into two categories: (Depending on the nature of random variable):
• Discrete Probability Distributions.
• Continuous Probability Distributions.
Example-1:
Possible values of X: 0, 1, 2
Sample space is given by S= {HH, HT, TH, TT}
This table is the Probability Mass Function (PMF). This table satisfies:
• 0 ≤P(X=x)≤ 1
• ∑𝑥 P(X = x) = ∑i=n i=n
i=1 P(X = xi ) = ∑i=1 p(xi ) = 1
3
Example-2: Consider an example of tossing a fair coin 3 times.
Define Random variable X: The number of heads obtained.
i.e. X denotes the number of heads while a fair coin is tossed 3 times.
Probability
Distribution
Let X be a random variable then a probability function f(x) maps the possible values of X
against their respective probabilities of occurrence, 𝑃(𝑋 = 𝑥) = 𝑝(𝑥).
Example-3:
Let X = Time taken to finish a task, is uniformly distributed between 0 and 1 hour.
1,0 ≤ x ≤ 1
f(x) = {
0, otherwise
Then,
0.5
P(0.2<X<0.5) = ∫0.2 f(x) dx =0.3
Note: For continuous RVs, 𝑃(𝑋 = 𝑎) = 0, but 𝑃(𝑎 < 𝑋 < 𝑏) > 0
Summary:
Random Variable Type Function Used Key Feature
Discrete R.V. PMF: P(X=x) Probabilities for exact values
Probabilities over intervals
Continuous R.V. PDF: f(x)
only
4
Bernoulli distribution, Binomial distribution, Negative Binomial distribution, Geometric
distribution, Hypergeometric distribution, Poisson distribution, etc.
1.2 Binomial Distribution: The Binomial Distribution is a discrete probability distribution that
models the number of successes in n (fixed number) independent Bernoulli trials.
If random variable X represents the number of successes in n independent Bernoulli trials.
Conditions for a Binomial Distribution:
5
Example-1: A coin is tossed 5 times. What is the probability of getting exactly 3 heads?
Solution:
Example-2: Decide whether the experiments given below are a binomial experiment. If it is,
specify the values of n, p, and q, and list the possible values of the random variable X. If it is
not a binomial experiment, explain why?
(i). You randomly select a card from a deck of cards, and note if the card is a king. You then
put the card back and repeat this process 8 times.
Yes…This is a binomial experiment. Each of the 8 selections represent an independent trial
because the card is replaced before the next one is drawn. There are only two possible
outcomes: (i) Either the card is a king or (ii) The card is not a king.
(ii). You roll a die 10 times and note the number the die lands on.
No…. This is not a binomial experiment. While each trial (roll) is independent, there are
more than two possible outcomes: 1, 2, 3, 4, 5, and 6.
So, Binomial Distribution is a discrete probability distribution that is used for data which can
only take one of two values, i.e.
• Pass or fail.
• Yes or no.
• Good or defective.
Binomial Distribution allows to compute the probability of the number of successes for a
given number of trials.
Success could mean anything you want to consider as a positive or negative outcome.
Recurrence Relation for the Binomial Distribution: PMF for the Binomial distribution is:
6
𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥
𝑥
𝑛
𝑃(𝑋 = 𝑥 + 1) = ( ) 𝑝 𝑥+1 (1 − 𝑝)𝑛−𝑥−1
𝑥+1
𝑃(𝑋 = 𝑥 + 1) 𝑝(𝑥 + 1) 𝑛 − 𝑥 𝑝
= = . . 𝑃(𝑋 = 𝑥)
𝑃(𝑋 = 𝑥) 𝑝(𝑥) 𝑥+1 𝑞
𝑛−𝑥 𝑝
𝑃(𝑋 = 𝑥 + 1) = . . 𝑝(𝑥)
𝑥+1 𝑞
Examples-3: The mean and standard deviation of a binomial distribution are 5 and 2.
Determines the distribution.
Hint for Solution: Let X is a discrete random variable that follows a Binomial distribution.
That is X~Bin (n, p)
We know that Probability Mass Function (PMF) for the Binomial distribution is given by:
𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 , 𝑥 = 0,1,2, … , 𝑛
𝑥
Mean = 𝜇 = 𝐸(𝑥) = 𝑛𝑝
Variance of random variable X = Var(X) =𝜎 2 = 𝑛𝑝𝑞
Examples-4: The mean and variance of a binomial variate are 8 and 6. Find P(X ≥ 2).
Examples-5: With the usual notation, find p for a binomial distribution if n = 6 and 9P(X =
4) = P(X = 2).
Examples-6: Two dice are thrown five times. Find the probability of getting the sum as 7 (i)
at least once, (ii) two times, and (iii) 𝑃(1 < 𝑋 < 5).
Examples-7: The incidence of corona in an industry is such that the workers have a 20%
chance of suffering from it. What is the probability that out of 6 workers chosen at random,
four or more will suffer from the corona?
Examples-8: If hens of a certain breed lay eggs on 5 days a week on an average, find how
many days during a season of 100 days a will poultry keeper with 5 hens of this breed expect
to receive at least 4 eggs.
1.4 Geometric Distribution: If the random variable X counts the number of required
Bernoulli trials until 1st success occurs.
Let 𝑝 be the probability of success in each trial, and 𝑞 = 1 − 𝑝 be the probability of failure.
Then, the probability mass function (PMF) of random variable X is given by:
Probability of 𝑥 = 𝑃(𝑋 = 𝑥) = 𝑝(1 − 𝑝)𝑥−1 , 𝑥 = 1,2, …
7
∞ ∞
𝑖−1
μ = E(X) = ∑ x. 𝑃(𝑋 = 𝑥) = ∑ i. p. (1 − 𝑝) = ∑ i. p. 𝑞 𝑖−1
x i=1 i=1
∞ ∞ ∞
i=1 i=1
1
(1 − q). E[X] = 1 this implies E[X] =
p
In other words, if independent trials having a common probability p of being successful are
performed until the first success occurs, then the expected number of required trials equals
1/p. For instance, the expected number of rolls of a fair die that it takes to obtain the value 1
is 6.
= ∑(𝑖 − 1 + 1)2 . p. 𝑞 𝑖−1 = ∑(𝑖 − 1)2 . p. 𝑞 𝑖−1 + 2 ∑(𝑖 − 1). p. 𝑞 𝑖−1 + ∑ p. 𝑞 𝑖−1
i=1 i=1 i=1 i=1
∞ ∞ ∞
= 𝑞. ∑ 𝑗 2 . p. 𝑞 𝑗−1 + 2 ∑ 𝑗. p. 𝑞 𝑗 + ∑ p. 𝑞 𝑗
j=0 j=1 j=0
= 𝑞. 𝐸[𝑋 2 ] + 2. 𝑞. 𝐸[𝑋] + 1
2. 𝑞
𝑝. 𝐸[𝑋 2 ] = +1
𝑝
2. 𝑞 1 2. 𝑞 + 𝑝 𝑞 + 1
𝐸[𝑋 2 ] = 2 + = =
𝑝 p 𝑝2 𝑝2
So,
𝑞+1 1 𝑞
𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2 = − =
𝑝2 𝑝2 𝑝2
1.5 Hypergeometric Distribution:
The hypergeometric distribution models the number of successes in a sample drawn
without replacement from a finite population.
We write:
𝑋~Hypergeometric(𝑁, 𝐾, 𝑛)
Where:
8
(𝐾𝑥)(𝑁−𝐾
𝑛−𝑥
)
𝑃(𝑋 = 𝑥) = ; 𝑚𝑎𝑥(0, 𝑛 − (𝑁 − 𝐾)) ≤ 𝑥 ≤ 𝑚𝑖𝑛(𝑛, 𝐾)
(𝑁𝑛)
Explanation of the above PMF:
• (𝐾𝑥): ways to choose x successes from K successful items
• (𝑁−𝐾
𝑛−𝑥
): ways to choose the remaining n−x from the N−K failures
Example-9: From a population of 20 items, 7 are defective. If 5 items are selected without
replacement, then:
Here, X is a discrete random variable because it can take only integer values like 0, 1, 2, ...,
up to 𝑚𝑖𝑛(𝐾, 𝑛) = 𝑚𝑖𝑛(7, 5) = 5.
Example-10: A box contains 10 balls: 4 red (successes) and 6 white (failures). 3 balls are
chosen at random without replacement.
PMF:
(𝑥4)(3−𝑥
6
)
𝑃(𝑋 = 𝑥) = ; 𝑥 = 0, 1, 2, 3
(10
3
)
1.6 Poisson Distribution: Poisson distribution is used when we wants to count the number of
events in a fixed interval.
λx e−λ
𝑃(𝑋 = 𝑥) = 𝑝(𝑥) = , 𝑥 = 0,1,2, … ∞
x!
9
• Number of defective bulbs produced by a reputed company.
• Number of telephone calls per minute.
• Number of cars passing a certain point in one minute.
• Number of printing mistakes per page in a large text.
• Number of persons born blind per year in a large city.
Observation: We can observe that Poisson distribution is the limiting case of Binomial
distribution.
Poisson distribution is the limiting case of Binomial distribution under the following
conditions:
(i) The number of trials should be infinitely large, i.e., 𝑛 → ∞.
(ii) The probability of successes p for each trial should be very small, i.e., p→ 0.
(iii) np = λ should be finite. Where λ is a constant.
10
So, Mean = λ
So, Var(X) = λ
Standard deviation =√variance = √𝜆
Example-11: If the mean of a Poisson variable is 1.8, find (i) P(X > 1), (ii) P(X = 5) and (iii)
P(0 < X < 5)
Example-12: If a random variable has a Poisson distribution such that P(X = 1) = P(X = 2),
find (i) the mean of the distribution, (ii) P(X ≥ 1), and (iii) P(1 <X< 4).
Solution: 𝜆 = 2
Example-13: If X is a Poisson variate such that P(X = 0) = P(X = 1), find P(X = 0) and using
recurrence relation formula, find the probabilities at X = 1, 2, 3, 4, and 5.
So:
Example-15: An insurance company insured 4000 people against loss of both eyes in a car
accident. Based on previous data, the rates were computed on the assumption that on the
average, 10 persons in 100000 will have car accidents each year that result in this type of
11
injury. What is the probability that more than 3 of the insured will collect on their policy in a
given year?
Solution: Using Binomial approximation to Poisson:
10
𝜆 = 𝑛. 𝑝 = 4000. ( ) = 0.4
100000
𝑃(𝑋 ≥ 4) = 1 − 𝑃(𝑋 ≤ 3) ≈ 0.000776252
Example-16: Suppose a book of 585 pages contains 43 typographical errors. If these errors
are randomly distributed throughout the book, what is the probability that 10 pages, selected
at random, will be free from errors?
Solution: This problem can be modeled using a hypergeometric distribution or approximated
with a binomial distribution due to the large number of pages.
The probability that a randomly selected page is error-free is:
P= 542/585≈ 0.9265
P(all 10 error-free)= 𝑝10 ≈ 0.464717369
Exact answer (Using Hypergeometric) ≈
X represents What?
Probability X
Distribution
12
Bernoulli X represents the outcome (Success or Failure) of a single trial of
a binary (two-outcome) experiment
Binomial Number of successes in n Bernoulli trials
Negative Binomial Total number of required Bernoulli trials to get r successes
Geometric Counts the number of required Bernoulli trials until 1st success
Hypergeometric Number of successes in a sample drawn without replacement
from a finite population
Poisson Counts the number of events in a fixed interval
Table: Summary of the Mean and Variance for some important Discrete Probability
Distributions:
Probability PMF Mean=𝝁 = 𝑽𝒂𝒓(𝑿) = 𝝈𝟐
Distribution 𝑬[𝑿] = 𝑬[𝑿𝟐 ] − (𝑬[𝑿])𝟐
Bernoulli 𝑃(𝑋 = 𝑥) = 𝑝 𝑥 (1 − 𝑝)1−𝑥 , 𝑝 𝑝(1 − 𝑝) = 𝑝𝑞
𝑥 = 0,1
Binomial 𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 , 𝑛𝑝 𝑛𝑝𝑞
𝑥
𝑥 = 0,1,2, … , 𝑛
Negative Binomial 𝑥−1 𝑟 𝑟 𝑟𝑞
𝑃(𝑋 = 𝑥) = ( ) 𝑝 (1 − 𝑝)𝑛−𝑟 ,
𝑟−1 𝑝 𝑝2
𝑥 = 𝑟, 𝑟 + 1, 𝑟 + 2, … ∞
Geometric 𝑃(𝑋 = 𝑥) = 𝑝(1 − 𝑝) 𝑥−1 , 1 q
𝑥 = 1,2, … ∞ p2
p
Hypergeometric (𝐾𝑥)(𝑁−𝐾 ) 𝐾 𝐾 𝐾 𝑁−𝑛
𝑃(𝑋 = 𝑥) = 𝑛−𝑥
; 𝑛. 𝑛. . (1 − ) .
(𝑁𝑛) 𝑁 𝑁 𝑁 𝑁−1
𝑚𝑎𝑥(0, 𝑛 − (𝑁 − 𝐾)) ≤ 𝑥
≤ 𝑚𝑖𝑛(𝑛, 𝐾)
Poisson 𝜆𝑥 e−λ λ λ
𝑝(𝑥) = ,
x!
𝑥 = 0,1,2, … ∞
The Moment Generating Function (MGF) of a random variable is a tool used to generate
moments (mean, variance, etc.) and is defined as:
∞
𝑝. 𝑒 𝑡
𝑀𝑋 (𝑡) = 𝐸[𝑒 ] = ∑ 𝑒 𝑡𝑋 . 𝑃(𝑋 = 𝑥) =
𝑡𝑋
, for t < − ln(q)
1 − 𝑞. 𝑒 𝑡
x=1
The nth moment about the origin is obtained by:
𝒅𝒏 𝑀𝑋 (𝑡)
𝝁′𝒏 = 𝑬[𝑿𝒏 ] = | 𝒂𝒕 𝒕 = 𝟎
𝒅𝒕𝒏
𝑏
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫ 𝑓(𝑥) 𝑑𝑥
𝑎
13
+∞
∫ 𝑓(𝑥) 𝑑𝑥 = 1 (Total probability = 1)
−∞
Important Notes:
• f(x) is NOT the probability that 𝑋 = 𝑥 (in fact, 𝑃(𝑋 = 𝑥) = 0 for continuous
variables).
• Instead, f(x) represents the density at point X=x, not the probability.
1
𝑓(𝑥) = {𝑏 − 𝑎 , 𝑎 ≤ 𝑥 ≤ 𝑏
0; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Normal distribution can entirely be described by its mean (𝜇) and standard deviation (𝜎).
𝑋~𝑁(𝜇, 𝜎 2 ) i.e., a continuous random variable X is said to be normally distributed, if its
probability density function (pdf) is given by:
1 (𝑥−𝜇)2
−
𝑓(𝑥) = 𝑒 2𝜎2 , 𝑤ℎ𝑒𝑟𝑒 − ∞ < 𝑥 < ∞, −∞ < 𝜇 < ∞, 0 < 𝜎
𝜎√2𝜋
The curve representing the normal distribution is called the normal curve.
That means, if X is a normal random variable with mean μ and standard deviation σ. Then
probability of X lying in the interval (x1, x2) is given by:
𝑥2
𝑃(𝑥1 ≤ 𝑋 ≤ 𝑥2 ) = ∫ 𝑓(𝑥)𝑑𝑥
𝑥1
14
Properties of the Normal Distribution:
The ordinate at 𝑋 = 𝜇 divides the area under the normal curve into two equal parts, i.e.
𝜇 ∞
1
∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑓(𝑥)𝑑𝑥 =
−∞ 𝜇 2
(v) The value of f(x) is always non-negative for all values of X, because the whole curve lies
above the 𝑋 axis.
(vi) The area under the normal curve is distributed as follows:
(a) The area between the ordinates at 𝜇 − 𝜎 and 𝜇 + 𝜎 is 68.27%
(b) The area between the ordinates at 𝜇 − 2𝜎 and 𝜇 + 2𝜎 is 95.45%
(c) The area between the ordinates at 𝜇 − 3𝜎 and 𝜇 + 3𝜎 is 99.74%
Important Observation:
15
Mean of a random variable works as centre of gravity for a equally distributed area of the
surface given by 𝑓(𝑥).
That is
𝜇 ∞
∫ (𝜇 − 𝑥). 𝑓(𝑥)𝑑𝑥 = ∫ (𝑥 − 𝜇). 𝑓(𝑥)𝑑𝑥
−∞ 𝜇
−𝑥2
∞ 1
Important Integral: ∫−∞ 𝑥 2 . 𝑒 2 𝑑𝑥 = 1; this is a known standard integral.
√2𝜋
This is the second moment about the mean of the standard normal distribution, i.e., 𝑍 ∼
𝑁(0,1). That is 𝐸(𝑋 2 ) = 1.
−𝑥2
1
Let 𝜙(𝑥) = 𝑒 2
√2𝜋
Recognize that it is an even function. We use the known result that the above integral is a
classic Gaussian-type integral involving an even power of 𝑥 of the form:
∞
2 √𝜋
∫ 𝑥 2 . 𝑒 −𝑎𝑥 𝑑𝑥 = , 𝑓𝑜𝑟 𝑎 > 0
−∞ 2. 𝑎3/2
Some other Important Integrals:
(i) Standard Gaussian-type integral:
∞
2 𝜋
∫ 𝑒 −𝑎𝑥 𝑑𝑥 = √
−∞ 𝑎
16
𝑋−𝜇 1
𝐸(𝑍) = 𝐸 ( ) = [𝐸(𝑋) − 𝜇] = 0
𝜎 𝜎
𝑋−𝜇 1 1
𝑉𝑎𝑟(𝑍) = 𝑉𝑎𝑟 ( ) = 2 𝑉𝑎𝑟(𝑋 − 𝜇) = 2 𝑉𝑎𝑟(𝑋) = 1
𝜎 𝜎 𝜎
𝑃(𝑥1 ≤ 𝑋 ≤ 𝑥2 ) = 𝑃(−𝑧1 ≤ 𝑍 ≤ 𝑧2 )
= 𝑃(−𝑧1 ≤ 𝑍 ≤ 0) + 𝑃(0 ≤ 𝑍 ≤ 𝑧2 ) = 𝑃(0 ≤ 𝑍 ≤ 𝑧1 ) + 𝑃(0 ≤ 𝑍 ≤ 𝑧2 )
17
Some other cases for 𝑷(𝑿 > 𝒙𝟏 ):
(I) If 𝒛𝟏 > 0
𝑃(𝑋 > 𝑥1 ) = 𝑃(𝑍 > 𝑧1 )
= 0.5 − 𝑃(0 ≤ 𝑍 ≤ 𝑧1 )
(II) If 𝒛𝟏 < 0
𝑃(𝑋 > 𝑥1 ) = 𝑃(𝑍 > −𝑧1 )
= 0.5 + 𝑃(−𝑧1 < 𝑍 < 0)
= 0.5 + 𝑃(0 < 𝑍 < 𝑧1 )
1. The normal distribution can be used to approximate binomial and Poisson distributions.
2. It is used extensively in sampling theory. It helps to estimate parameters from statistics
and to find confidence limits of the parameter.
18
3. It is widely used in testing statistical hypothesis and tests of significance in which it is
always assumed that the population from which the samples have been drawn should
have normal distribution.
4. It serves as a guiding instrument in the analysis and interpretation of statistical data.
5. It can be used for smoothing and graduating a distribution which is not normal simply by
contracting a normal curve.
Example-17: What is the probability that a standard normal variate Z will be (i) greater than
1.09, (ii) less than or equal –1.65, (iii) lying between –1 and1.96, (iv) lying between 1.25 and
2.75?
Solution: (i) 0.1379 (ii) 0.0495 (iii) 0.8163 (iv) 0.1026
Example-18: If X is a normal variate with a mean of 30 and an SD of 5, find the probabilities
that (i) 26 ≤ 𝑋 ≤ 40, and (ii) 𝑋 ≥ 45?
Solution: (i) We compute the probabilities for the given intervals using the standard normal
𝑋−𝜇
distribution. The standard normal variable is given by: 𝑍 = 𝜎
Solution: Let a normal random variable 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ) representing the resistance of resistors
with mean μ=100 ohms and standard deviation 𝜎 = 2 𝑜ℎ𝑚𝑠, we need to find the percentage
of resistors with resistances between 98 ohms and 102 ohms, i.e., 𝑃(98 ≤ 𝑋 ≤ 102).
𝑋−𝜇
Converting the bounds to standard normal Z-scores using 𝑍 = :
𝜎
So:
19
Example-20: The average seasonal rainfall in a place is 16 inches with an SD of 4 inches.
What is the probability that the rainfall in that place will be between 20 and 24 inches in a
year?
Solution: 0.4772 − 0.3413 = 0.1359
Important Note:
After dealing with previous examples, lets bag some important information.
Here it is…
1𝑥
(1) 𝑃(𝑋 < 𝑥1 ) = 𝐹(𝑥1 ) = ∫−∞ 𝑓(𝑥)𝑑𝑥
Hence, 𝑃(𝑋 < 𝑥1 ) represent the area under the curve from 𝑋 = −∞ to 𝑋 = 𝑥1 .
(2) If 𝑃(𝑋 < 𝑥1 ) < 0.5,then the point 𝑥1 lies to the left of 𝑋 = 𝜇and the corresponding value
of standard normal variate will be negative.
(3) If 𝑃(𝑋 < 𝑥1 ) > 0.5,then the point 𝑥1 lies to the right of 𝑋 = 𝜇 and the corresponding
value of standard normal variate will be positive.
Examples-21: If X is a normal variate with a mean of 120 and a standard deviation of 10,
find c such that (i) 𝑃(𝑋 > 𝑐) = 0.02, and (ii)𝑃(𝑋 < 𝑐) = 0.05.
Solution: (i) 𝑐 = 140.54 (ii) 𝑐 = 103.55
Examples-22: Assume that the mean height of Indian soldiers is 68.22 inches with a
variance of 10.8 inches. How many soldiers in a regiment of 1000 would you expect to be
over 6 feet tall?
Solution: √10.8 = 3.286, 𝑧1 = 1.15
6 feet = 72 inches
𝑝 = 0.1251
Expected number over 6 ft in 1000 soldiers = 1000 × 0.125 ≈125
20
2.3 Exponential Distribution: A continuous random variable X is said to follow the
Exponential distribution if its probability function is given by: Time between events (e.g.,
waiting time)
𝑓(𝑥) = λe−λx , 𝑥 ≥ 0, where λ is the rate of distribution.
Exponential Distribution can be used, if we wants to find the time between random events
(e.g., waiting time).
Note:
• When times between random events follows the Exponential distribution with rate λ,
then the total number of events in a time period of length t follows the Poisson
distribution with parameter λt.
• Exponential distribution is memoryless distribution.
Poisson Exponential
➢ Number of hits to Marwadi ➢ Number of minutes between
University’s website in one two hits to Marwadi
minute. University’s website.
➢ Number of soldiers killed ➢ Number of years between
by horse-kick per year. horse-kick deaths of soldier.
So, Events per single unit of So, Time per single event.
time.
i.e. Exponential variate is actually time between the events which are in Poisson distribution.
i.e. you may think like that’inverse’ of Poisson)
Example-23: Let X be the Exponential random variate with probability density function
𝑥
1
𝑒 −5 ; 𝑥 > 0
𝑓(𝑥) = { 5 . Then find-
0; otherwise
(i) P(X > 5) (ii)P(3 ≤ X ≤ 6) (iii)Mean (iv)Variance.
1 1 1 1 1
Solution: (i) e ≈ 0.3679 (ii) e3/5 − e6/5 ≈ 0.2476 (iii) Mean=λ = 5 (iv) Variance = λ2 = 25
Example-24: Let X be the Exponential random variate with probability density function,
𝑐𝑒 −2𝑥 ; 𝑥 > 0
𝑓(𝑥) = {
0; otherwise
1 1
Find (i) P(X > 2)(ii)P (X < c) (iii) 𝑃 ( c < X < 𝑐)
21
1 1 1 1 1 1
Solution: 𝑐 = 2 (i) e4 (ii) 1 − e (iii) 1 − e4 − (1 − e) = e − e4
Example-25: The mileage which car owners get with a certain kind of radial tire is a random
variable having an exponential distribution with mean 4000 km. Find the probabilities that
one of these tires will last (i) at least 2000 km (ii) at most 3000 km.
Solution: X~Exponential
1
Rate = λ =
4000
1
(i) P(X ≥ 2000) = e = 0.6065 (ii) P(X≤3000)= 1 − 𝑒−0.75 ≈ 0.5276
√
As we know that, An Exponential variate is time between the Poisson process. i.e. In other
words, we can say Exponential variate is “inverse” of Poisson.
Let’s brush up our knowledge…
Some requirements for Poisson and Exponential variate is-
Example-27: Suppose Dr. Chetan start monitoring visit to Marwadi University website
from 9:00 am.
So, Mathematically,
• The exponential distribution has the memory-less (forgetfulness) property.
22
• This property indicates that the distribution is independent of its part, that means future
happening of an event has no relation to whether or not this event has happened in the
past.
Mathematically, this property can be expressed as:
If X is exponentially distributed and s & t are two positive real numbers then,
𝑃[(𝑋 > 𝑠 + 𝑡)|(𝑋 > 𝑠)] = 𝑃(𝑋 > 𝑡)
For example:
Example-28: The time (in hours) required to repair a machine is exponentially distributed
with mean = 2 hours (i) What is the probability that the repair time exceeds 2 hours? (ii)
What is the probability that a repair takes at least 11 hours given that its direction exceeds 8
hours?
Solution: Let X denotes the repair time, where 𝑋~𝐸𝑥𝑝(λ). For an exponentially distributed
random variable X with mean 𝜇 = 2 hours.
1 1
λ = μ = 2 per hour
The probability density function is 𝑓(𝑥) = λe−λx , 𝑥 ≥ 0, and the cumulative distribution
function is 𝐹(𝑥) = 1 − e−λx .
(i) 𝑃(𝑋 > 2) = 1 − 𝑃(𝑋 ≤ 𝑥) = 1 − 𝐹(𝑥) = e−λx = e−0.5∗2 = e−1 ≈ 0.3679
𝑆𝑜, the probability that the repair time exceeds (more than) 2 hours is approximately 0.3679.
(ii) We need to find the conditional probability 𝑃(𝑋 ≥ 11 ∣ 𝑋 > 8).
For an exponential distribution, the memoryless property applies, which states:
𝑃(𝑋≥𝑠+𝑡)
𝑃( 𝑋 ≥ 𝑠 + 𝑡 ∣ 𝑋 > 𝑠 ) = 𝑃(𝑋 ≥ 𝑡) OR 𝑃( 𝑋 ≥ 𝑠 + 𝑡 ∣ 𝑋 > 𝑠 ) = 𝑃(𝑋>𝑠)
Thus:
𝑃( 𝑋 ≥ 11 ∣ 𝑋 > 8 ) = 𝑃(𝑋 ≥ 3) = e−0.5∗3 = e−1.5 ≈ 0.2231
A random variable X is said to follow a Gamma distribution with shape parameter k > 0
and rate parameter λ > 0, written as:
X ∼ Gamma(k, λ)
23
λk xk−1 e−λx
𝑓(𝑥; k, λ) = for 𝑥 ∈ [0, ∞); λ, k > 0
Γk
Where the parameter k is called the shape parameter, and the parameter λ is called the rate
parameter (which is same we used in Exponential distribution) and Γk = (k − 1)!
𝑇 = ∑ 𝑋𝑖
𝑖=1
And this whole system lifetime has gamma distribution, that is 𝑇~ 𝐺𝑎𝑚𝑚𝑎(𝑥; k, λ)
Thus, the sum of k independent exponential random variables has a gamma distribution.
Special Cases:
(i) If we take 𝒌 = 𝟏, then the pdf of Gamma distribution becomes the pdf of an
Exponential distribution.
(ii) The Chi-square distribution is a special case of the Gamma distribution with
∝ 1
k= 2 , λ = 2.
We know that the PMF of Gamma distribution is given by:
λk xk−1 e−λx
𝑓(𝑥; k, λ) = , 𝑥 ≥ 0; λ, k > 0
Γk
If 𝑋~𝜒𝑘2 , then the PMF of Chi-squared distribution 𝜒𝑘2 with 𝑘 degrees of freedom will be
given by:
∝
∝ x
1 2
(2) (x) 2 −1 e− 2
∝ 1
𝑓 (𝑥; , ) = 𝑓(𝑥; ∝) = ∝ , 𝑥 ≥ 0; ∝ > 0
2 2 Γ2
{ 0, otherwise
24
Parameters (Mean and Variance) of Gamma Distribution
𝑘
Mean of Gamma Distribution: 𝐸(𝑋) = 𝜆
𝑘
Variance of Gamma Distribution: 𝑉𝑎𝑟(𝑋) = 𝜆2
√𝑘
Standard deviation of Gamma Distribution: 𝑆. 𝐷. = 𝜆
λr 𝑥 𝑟−1 e−λx
𝑓(𝑥; r, λ) = ; 𝑥 ≥ 0; λ, r > 0
Γr
For 𝑟 = 3 and λ = 2
𝑓(𝑥; r, λ) = 4𝑥 2 𝑒 −2𝑥 ; 𝑥 ≥ 0
𝑟 3
𝑀𝑒𝑎𝑛 = 𝜆 = 2
𝑟 3
Variance = 𝜎 2 = 𝐸[𝑋 2 ] − (𝐸[𝑋])2 = λ2 = 4 = 0.75
(𝜆𝑥)2
𝐹(𝑥) = 1 − 𝑒 −𝜆𝑥 (1 + 𝜆𝑥 + )
2!
1.5 1.5
1
𝑃(𝑋 ≤ 𝑥) = 𝛾(𝑟, 𝜆𝑥)
𝛤𝑟
𝜆𝑥
Where, 𝛤𝑟 is the Gamma function, and 𝛾(𝑟, 𝜆𝑥) = ∫0 𝑥 𝑟−1 𝑒 −𝑥 𝑑𝑥 is the lower incomplete
Gamma function.
For a Gamma distribution with integer r, the CDF can be expressed as:
25
r−1
−λx
(𝜆𝑥)𝑘
𝑃(𝑋 ≤ 𝑥) = 1 − e ∑
𝑘!
k=0
2
−3
(3)𝑘 8.5
𝑃(𝑋 ≤ 1.5) = 1 − e ∑ = 1 − e−3 (1 + 3 + 4.5) = 1 − 3 ≈ 0.5768
𝑘! e
k=0
Example-30: In a certain city, the daily consumption of electric power in millions of kilowatt
hours can be treated as a random variable having gamma distribution with parameters λ =
1/2 and r = 3. If the power plant of this city has a daily capacity of 12 million kilowatt-hours,
what is the probability that this power supply will be inadequate on any given day?
1
𝑃(𝑋 ≤ 𝑥) = 𝛾(𝑟, 𝜆𝑥)
𝛤𝑟
r−1
−λx
(𝜆𝑥)𝑘
𝑃(𝑋 ≤ 𝑥) = 1 − e ∑
𝑘!
k=0
Table: Summary of the Mean and Variance for some important Continuous Probability
Distributions:
Probability PDF Mean=𝝁 = 𝑬[𝑿] 𝐕𝐚𝐫𝐢𝐚𝐧𝐜𝐞 = 𝑽𝒂𝒓(𝑿) S.D.
𝟐 𝟐]
= 𝝈 = 𝑬[𝑿 − (𝑬[𝑿])𝟐 =𝝈
Distribution
1
Uniform 𝑓(𝑥) = 𝑏−𝑎 , 𝑎+𝑏 (𝑏 − 𝑎)2
2 12
𝑎≤𝑥≤𝑏
Normal 𝑓(𝑥) 𝜇 𝜎2
1 (𝑥−𝜇)2
(Gaussian) −
= 𝑒 2𝜎2 ,
𝑋~𝑁(𝜇, 𝜎2 ) 𝜎√2𝜋
𝑥 ∈ 𝑅, 𝜇 ∈ 𝑅,
0<𝜎
𝑥 ≥ 0; λ, k > 0
26
Probability Mass Function (PMF):
Example-31: Check whether function defined below is probability mass function or not.
X 0 1 2 3
X 0 1 2 3 4 5
• Total items = 10
• Number of defective items = 3
• Number of non-defective items = 7
• Sample size = 4
• Let X be the number of defective items in the sample.
27
(𝑥3)(4−𝑥
7
)
𝑃(𝑋 = 𝑥) = ; 𝑥 = 0,1,2,3
(10
4
)
So, Probability Distribution is given by:
X 0 1 2 3
P(X) 1/6 1/2 3/10 1/30
Properties of CDF
(1) 𝐹(𝑥𝑛 ) = 𝑝(𝑥1 ) + 𝑝(𝑥2 ) + ⋯ + 𝑝(𝑥𝑛 )
(2) ∑𝑛𝑖=1 𝑝(𝑥𝑖 ) = 1
(3) 0 ≤ 𝐹(𝑥𝑖 ) ≤ 1, 𝑖 = 1,2, … , 𝑛
(4)P(𝑎 < 𝑋 ≤ 𝑏) = 𝐹(𝑏) − 𝐹(𝑎)
Example-34: A discrete random variable X takes the values –3, –2, –1, 0, 1, 2, 3, such that
P(X = 0) = P(X > 0) = P(X < 0) and P(X = –3) = P(X = –2) = P(X = –1) = P(X = 1) = P(X =
2) = P(X = 3). Obtain the probability distribution and the cumulative distribution function of
X.
Solution: Given conditions:
𝑃(𝑋 = −3) = 𝑃(𝑋 = −2) = 𝑃(𝑋 = −1) = 𝑃(𝑋 = 1) = 𝑃(𝑋 = 2) = 𝑃(𝑋 = 3).
And 𝑃(𝑋 = 0) = 𝑏
X -3 -2 -1 0 1 2 3
𝑃(𝑋 = 𝑥) 𝑎 a a b a a a
28
Then form the given condition-1:
𝑏 = 3𝑎 = 3𝑎.
Also we know that ‘Total probability’=1, this gives
6𝑎 + 𝑏 = 1
1 1
6𝑎 + 3𝑎 = 9𝑎 = 1 ⇒ 𝑎 = 9 and 𝑏 = 3𝑎 = 3.
X -3 -2 -1 0 1 2 3
𝑃𝑀𝐹 = 𝑃(𝑋 = 𝑥) 1 1 1 1 1 1 1
= 𝑝(𝑥) 9 9 9 3 9 9 9
CDF 1 2 1 2 7 8 1
= 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) 9 9 3 3 9 9
that is:
0, 𝑥 < −3
1
, −3 ≤ 𝑥 < −2
9
2
, −2 ≤ 𝑥 < −1
9
1
, −1 ≤ 𝑥 < 0
𝐹(𝑥) = 32
,0 ≤ 𝑥 < 1
3
7
,1 ≤ 𝑥 < 2
9
8
,2 ≤ 𝑥 < 3
9
{ 1, 3 ≤ 𝑥
Note: CDF is always non-decreasing and ends at 1.
Note: If ∅(𝑥) is a function of discrete random variable X then the expectation of ∅(𝑥) is
given by
∞
Properties of Mean:
29
(1) 𝐸(𝑘) = 𝑘, where 𝑘 is a constant.
(2) 𝐸(𝑘𝑋) = 𝑘𝐸(𝑋)
(3) 𝐸(𝑎𝑋 ± 𝑏) = 𝑎𝐸(𝑋) ± 𝑏
(4) 𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌) provided 𝐸(𝑋) and 𝐸(𝑌) exists
(5) 𝐸(𝑋𝑌) = 𝐸(𝑋)𝐸(𝑌) if X and Y are independent.
Variance:
The variance of the probability distribution of a discrete random variable X is given by
Var(X) = 𝜎 2 = 𝐸[(𝑋 − 𝜇)2 ]
= 𝐸(𝑋 2 − 2𝑋𝜇 + 𝜇 2 ) = 𝐸(𝑋 2 ) − 2𝜇𝐸(𝑋) + 𝜇 2 = 𝐸(𝑋 2 ) − 2𝜇𝜇 + 𝜇 2
= 𝐸(𝑋 2 ) − 𝜇 2 = 𝐸[𝑋 2 ] − (𝐸[𝑋])2
Properties of Variance:
(1) 𝑉𝑎𝑟(𝑘) = 0
(2) 𝑉𝑎𝑟(𝑋 + 𝑘) = 𝑉𝑎𝑟(𝑋)
(3) 𝑉𝑎𝑟(𝑘𝑋) = 𝑘 2 𝑉𝑎𝑟(𝑋)
(4) 𝑉𝑎𝑟(𝑎𝑋 ± 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋)
(5) 𝑉𝑎𝑟(𝑋 + 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) + 2. 𝐶𝑜𝑣(𝑋, 𝑌)
If X and Y are independent random variables, then 𝐶𝑜𝑣(𝑋, 𝑌) = 0
(6) 𝑉𝑎𝑟(𝑋 − 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) − 2. 𝐶𝑜𝑣(𝑋, 𝑌)
(7) 𝑉𝑎𝑟(𝑎𝑋 + 𝑏𝑌) = 𝑎2 𝑉𝑎𝑟(𝑋) + 𝑏 2 𝑉𝑎𝑟(𝑌) + 2𝑎𝑏. 𝐶𝑜𝑣(𝑋, 𝑌)
Where, 𝐶𝑜𝑣(𝑋, 𝑌) represents the covariance between random variables X and Y.
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )]
Or, equivalently:
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[𝑋𝑌] − 𝐸[𝑋]. 𝐸[𝑌]
Where
∑ 𝑥. 𝑃(𝑥) ; 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝐸[𝑋] =
∫ 𝑥. 𝑓𝑋 (𝑥)𝑑𝑥 ; 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
{
And
∑ ∑ 𝑥𝑦. 𝑃(𝑥, 𝑦); 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝐸[𝑋𝑌] = 𝑥 𝑦
X Y P(X,Y)
1 2 0.2
1 4 0.3
2 2 0.1
2 4 0.4
Compute 𝐸[𝑋], 𝐸[𝑌], 𝐸[𝑋𝑌]
𝐸[𝑋] = 1(0.2 + 0.3) + 2(0.1 + 0.4) = 1.5
𝐸[𝑌] = 2(0.2 + 0.1) + 4(0.3 + 0.4) = 3.4
30
𝐸[𝑋𝑌] = (1)(2)(0.2) + (1)(4)(0.3) + (2)(2)(0.1) + (2)(4)(0.4) = 5.2
Hence,
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )] = 𝐸[𝑋𝑌] − 𝐸[𝑋]. 𝐸[𝑌] = 5.2 − (1.5)(3.4) = 0.1
∫ ∫ 8𝑥𝑦𝑑𝑥𝑑𝑦 = 2
0 0
So, it is not a valid pdf because the total area is not 1. We divide it by 2.
The corrected joint pdf will be
𝑓(𝑥, 𝑦) = 4𝑥𝑦 𝑓𝑜𝑟 0 < 𝑥 < 1,0 < 𝑦 < 1
1 1 1 1
2
𝐸[𝑋] = ∫ ∫ 𝑥. 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = ∫ ∫ 𝑥. 4𝑥𝑦𝑑𝑥𝑑𝑦 =
3
0 0 0 0
1 1
2
𝐸[𝑌] = ∫ ∫ 𝑦. 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 =
3
0 0
1 1
4
𝐸[𝑋𝑌] = ∫ ∫ 𝑥𝑦. 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 =
9
0 0
Hence,
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )] = 𝐸[𝑋𝑌] − 𝐸[𝑋]. 𝐸[𝑌] = 0
X -2 -1 0 1 2
Example-38: The monthly demand of a product is known to have the following probability
distribution. Find the expected demand for the product. Also, compute the variance.
Demand
1 2 3 4 5 6 7 8
(x)
Probability
0.08 0.12 0.19 0.24 0.16 0.1 0.07 0.04
p(x)
31
Hint: E[X]= 4.06, Var(X) = 3.2164
Example-39: Let X be a random variable with E(X) = 10 and Var (X) = 25. Find the positive
values of a and b such that Y = aX – b has an expectation of 0 and a variance of 1.
Hint: a=1/5, b= 2
𝜇 = 𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞
Note: If ∅(𝑥) is a function of continuous random variable X then the expectation of ∅(𝑥) is
given by
∞
𝐸(∅(𝑥)) = ∫ ∅(𝑥)𝑓(𝑥)𝑑𝑥
−∞
32
Standard deviation: SD = √Var(X) = 𝜎
Example-40: Check whether the following function f(x) is a probability density function or
not. Also, find the probability that the variable having this density falls in the interval [1, 2].
𝑒 −𝑥 , 𝑥 ≥ 0
𝑓(𝑥) = {
0, 𝑥 < 0
1 1
Hint: 𝑒 − 𝑒 2 ≈0.233
Example-41: For the following PDF of a R.V. X, find (i) the value of k and the probabilities
that a random variable having this probability density will take on a value (ii) between 0.1
and 0.2, and, (iii) greater than 0.5.
𝑘(1 − 𝑥 2 ), 0 < 𝑥 < 1
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
3 5
Hint: (i) k=2 (ii) 0.1465 (iii) 16 = 0.3125
Example-42: A continuous random variable X has following PDF 𝑓(𝑥). Find a and b such
that (i) P(X ≤ a) = P(X > a) and (ii) P(X > b) = 0.05, where, 0 < 𝑎, 𝑏 < 1.
3𝑥 2 , 0 < 𝑥 < 1
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Hint: (i) 0.7937 (ii) 0.983
Example-43: Find the constant k such that the following function f(x) is a PDF. Also, find
the cumulative distribution function F(x) and P(1 < X ≤ 2).
𝑘𝑥 2 , 0 < 𝑥 < 3
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
1 𝑥3 7
Solution: (i) 9 (ii) 27 (iii) 27
Example-44: If the density function of a random variable X is given as below then find (i)
value of k, (ii) Expectation of X, (iii) Variance, (iv) SD.
𝑘𝑥(1 − 𝑥), 0 < 𝑥 < 1
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
1 1 1
Solution: (i) 6 (ii) 2 (iii) 20 (iii)
√20
33
𝑥
, 0<𝑥≤1
2
1
, 1<𝑥≤2
𝑓(𝑥) 2
3−𝑥
, 2<𝑥≤3
2
{0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Hint for Solution:
3
−11
𝐸[∅(𝑥)] = ∫ ∅(𝑥). 𝑓(𝑥)𝑑𝑥 =
0 6
2nd Method: 𝐸[∅(𝑥)] = 𝐸[𝑋 2 ] − 5. 𝐸[𝑋] + 3
3 8
𝐸[𝑋] = , 𝐸[𝑋 2 ] =
2 3
For a random variable X, the rth moment about the origin (also called the Raw moment) is:
𝜇𝑟 ′ = 𝐸[𝑋 𝑟 ]
For a random variable X, the rth moment about the mean (also called the Central moment) is:
Kurtosis is a statistical measure that tells us how sharp or flat the peak of a distribution is,
compared to a normal (bell-shaped) distribution.
Kurtosis helps you understand: (i) Is the data peaked (with extreme values)? (ii) Or is it flat
and spread out?
34
Table: Types of Kurtosis
Type Description
Mesokurtic Normal kurtosis (like a bell curve), kurtosis=3
Where:
xi −x̅ 4
The term ∑ni=1 ( ) captures the fourth moment about the mean.
s
n(n+1)
The adjustment factor like (n−1)(n−2)(n−3) correct for bias in small samples.
Subtracting the second term gives us excess kurtosis (Kurtosis relative to normal
distribution).
(1) Kurtosis for a Population: For a dataset with mean μ and standard deviation σ,
n
1 xi − μ 4
Kurtosis = ∑ ( )
n σ
i=1
Where
• μ = population mean
• σ = population standard deviation
Note: Excess Kurtosis = Kurtosis – 3, because the normal distribution has kurtosis = 3
35
• If X is discrete, find 𝑃(𝑌 = 𝑦) = 𝑃(𝑋 = 𝑥 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑔(𝑥) = 𝑦).
• If X is continuous, use transformation techniques (e.g., change of variables,
Jacobians for multiple variables).
1. If 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ), then the 2nd moment about the mean, that is variance is σ2 .
2. If 𝑌 = 𝑋 2 , and 𝑋 ∼ 𝑁(0,1), then:
• 𝐸[𝑌] = 𝐸[𝑋 2 ] = 1
• Y follows a chi-squared distribution with 1 degree of freedom.
The Law of Large Numbers is a fundamental theorem in probability that describes the result
of performing the same experiment for a large number of times.
Definition:
The Law of Large Numbers states that as the number of trials (or observations) increases,
the sample mean of the observed outcomes gets closer to the expected (theoretical) mean.
i.e. when a random experiment is repeated many times, the average result will tend to be
close to the expected value.
Mathematical Statement:
Let 𝑋1, 𝑋2,…, 𝑋𝑛 be independent and identically distributed (i.i.d.) random variables with
mean μ=E[𝑋𝑖 ]. Then:
1
𝑋̅𝑛 =𝑛 ∑𝑛i=1 𝑋𝑖 → μ as n→∞
That is, the sample mean 𝐗 ̅ 𝐧 converges to the population mean μ as the number of
observations n increases.
Markov Inequality:
For any random variable 𝑋 ≥ 0 and 𝜀 > 0: 𝑃(𝑋 > 𝜀) ≤ 𝐸[𝑋]/𝜀
Chebyshev's Inequality:
If 𝐸[𝑋] = 𝜇, and 𝑉𝑎𝑟(𝑋) = 𝜎² < ∞, then: 𝑃(|𝑋 − 𝜇| > 𝜀) ≤ 𝑉𝑎𝑟(𝑋)/𝜀²
Example-48: Let 𝑋₁, . . . , 𝑋₁₀₀₀ ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(3). Find 𝑃(|𝑋̅₁₀₀₀ − 3| > 0.2) using Chebyshev's
inequality.
𝑉𝑎𝑟 (𝑋𝑖 ) 𝜎2 3
Var (𝑋̅1000) = = = = 0.003.
𝑛 𝑛 1000
36
Chebyshev's inequality gives, for 𝜀 = 0.2,
1, 𝑖𝑓 𝐻𝑒𝑎𝑑
𝑋𝑖 = {
0, 𝑖𝑓 𝑇𝑎𝑖𝑙
Let’s toss the coin n times and compute the sample mean:
The Central Limit Theorem (CLT) is one of the cornerstone results in probability and
statistics. It explains why many distributions tend to look "normal" (bell-shaped) when you
average or sum a large number of random variables.
Statement of the Theorem
Informal Statement:
If you take a large number of independent and identically distributed (i.i.d.) random
variables, each with a finite mean and variance, then the distribution of their properly
normalized sum (or average) will tend toward a normal (Gaussian) distribution, regardless of
the original distribution of the variables.
37
̅̅̅
𝑋𝑛 − μ
𝑍𝑛 = σ ; 𝑡ℎ𝑒𝑛 𝑍𝑛 ~𝑁(0,1) 𝑎𝑠 𝑛 → ∞
√𝑛
This means the distribution of 𝑍𝑛 approaches the standard normal distribution as 𝑛 → ∞.
𝑋 −μ 𝑋 −E[X] ̅𝑋̅̅𝑛̅−μ
Note: Here in place of taking the transformation 𝑍 = = we are taking 𝑍𝑛 = σ , It
σ √Var(X)
√𝑛
seems that we are taking different transformation. But infact we are using the same transformation.
To see this, Let us consider
1
̅̅̅
𝑋𝑛 = (𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )
𝑛
Then:
1
̅ n ] = E [ (𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )]
E[X
𝑛
1 1
= (𝐸[𝑋1 ] + 𝐸[𝑋2 ] + ⋯ + 𝐸[𝑋𝑛 ]) = (μ + μ + ⋯ + μ) = μ
𝑛 𝑛
1 1
̅ n ) = Var ( (𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )) = (𝑉𝑎𝑟(𝑋1 ) + 𝑉𝑎𝑟(𝑋2 ) + ⋯ + 𝑉𝑎𝑟(𝑋𝑛 ))
Var(X
𝑛 𝑛2
1 𝜎2
= 2 𝑛. 𝜎 2 =
𝑛 𝑛
̅n) = σ
S.D. (X
√𝑛
̅𝑋̅̅𝑛̅−E[X̅ n ] ̅𝑋̅̅𝑛̅−μ
So: 𝑍𝑛 = 𝑍𝑛 = = σ
√Var(X̅ n )
√𝑛
38
So, we can say that even though the original population (die roll outcomes) is uniform (not
normal), the distribution of sample means is approximately normal for sufficiently large n
(=1000), as guaranteed by the Central Limit Theorem.
Example-51: A company finds that the time to process a customer request is normally
distributed with mean μ = 20 minutes and standard deviation σ = 5 minutes. Find the
probability that the average processing time of 36 customers is:
(i) Less than 19 minutes (ii) Between 19 and 21 minutes
(𝑋̅𝑛 − 𝜇)
Hint: Formula: 𝑍 = and then see the value standard in the normal table 𝛷(𝑍).
𝜎 / √𝑛
(𝑋̅𝑛 − 20)
Solution: 𝑍 = 5/6
(19 − 20) −6
(i) For 𝑋 = 19, 𝑍 = =
5/6 5
So, 𝑃(𝑋 < 19) = 𝑃(𝑍 < −1.2) = 𝑃(−∞ < 𝑍 < −1.2) = 𝑃(1.2 < 𝑍 < ∞)
= 𝑃(0 < 𝑍 < ∞) − 𝑃(0 < 𝑍 < 1.2) = 0.5 − 0.3849 = 0.1151
(21 − 20)
(ii) 𝑃(19 < 𝑋 < 21) = 𝑃 (−1.2 < 𝑍 < 5 ) = 𝑃(−1.2 < 𝑍 < 1.2)
6
= 2. 𝑃(0 < 𝑍 < 1.2) = 2 ∗ 0.3849 = 0.7698
Appendix
39
1. Standard Normal Z - Table
40
−𝑥2
∞ 1
2. Important Integral: ∫−∞ 𝑥 2 . 𝑒 2 𝑑𝑥 = 1; this is a known standard integral.
√2𝜋
Let,
1 −𝑥2
𝜙(𝑥) = 𝑒 2
√2𝜋
Recognize that it is an even function. We use the known result that the above integral is a
classic Gaussian-type integral involving an even power of 𝑥 of the form:
∞
2 √𝜋
∫ 𝑥 2 . 𝑒 −𝑎𝑥 𝑑𝑥 = , 𝑓𝑜𝑟 𝑎 > 0
−∞ 2. 𝑎3/2
Derivation Sketch: We know that the Standard Gaussian integral is given by:
∞
2 𝜋
∫ 𝑒 −𝑎𝑥 𝑑𝑥 = √
−∞ 𝑎
Taking the derivative w.r.t. 𝑎 under the integral sign:
∞ ∞ ∞
𝑑 2 𝑑 −𝑎𝑥 2 2
(∫ 𝑒 −𝑎𝑥 𝑑𝑥) = ∫ 𝑒 𝑑𝑥 = ∫ −𝑥 2 𝑒 −𝑎𝑥 𝑑𝑥
𝑑𝑎 −∞ −∞ 𝑑𝑎 −∞
So:
∞
2 𝑑 𝜋
∫ 𝑥 2 . 𝑒 −𝑎𝑥 𝑑𝑥 = − (√ )
−∞ 𝑑𝑎 𝑎
After differentiating, we get:
∞
2 √𝜋
∫ 𝑥 2 . 𝑒 −𝑎𝑥 𝑑𝑥 =
−∞ 2. 𝑎3/2
41