0% found this document useful (0 votes)
68 views81 pages

Sampling CLT CI

This document provides an overview of key concepts in statistical inference, including population sampling, the sampling distribution of means, and the central limit theorem (CLT). It discusses how the CLT allows the sampling distribution of sample means to be approximated as a normal distribution when the sample size is sufficiently large. It also covers the calculation of z-scores and t-scores, and how the t-distribution is used when the population standard deviation is unknown or the sample size is small. Examples are provided to illustrate concepts like the sampling distribution of means and applying the CLT.

Uploaded by

M. Amin Qureshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views81 pages

Sampling CLT CI

This document provides an overview of key concepts in statistical inference, including population sampling, the sampling distribution of means, and the central limit theorem (CLT). It discusses how the CLT allows the sampling distribution of sample means to be approximated as a normal distribution when the sample size is sufficiently large. It also covers the calculation of z-scores and t-scores, and how the t-distribution is used when the population standard deviation is unknown or the sample size is small. Examples are provided to illustrate concepts like the sampling distribution of means and applying the CLT.

Uploaded by

M. Amin Qureshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Population Sampling, DF, Sampling

Distribution of Means, CLT, Estimation of


Mean and Confidence Interval (CI)

Business Statistics/ Statistical


Inference

Lecture Prepared by Ikram-E-Khuda 1


Recap of Previous Lecture
For 68% Area

68% 68% 68%

z=0.47 z=-1 z=1


z=-0.47
For 95% Area

95% 95% 95%


z=1.65 z=-1.65 z=-1.96 z=1.96

For 99.7% Area

99.7% 99.7%
99.7%
Lecture Prepared by Ikram-E-Khuda 2
z=2.75 z=-2.75 z=-2.97 z=2.97
Empirical Rule

Lecture Prepared by Ikram-E-Khuda 3


Population Sampling
• Population sampling is the process of taking a subset of
subjects that is representative of the entire population.

• The sample must have sufficient size to warrant statistical


analysis.

• Sampling is done usually because it is impossible to test


every single individual in the population.

• It is also done to save time, money and effort while


conducting the research.
Lecture Prepared by Ikram-E-Khuda
Ref: https://explorable.com/population-sampling 4
Sampling Analytics
• Suppose we have a dataset or random variable as;

X=1.2, 1.3,1.1, 1.4,1.2, 1.3, 1.5, 1.6, 1.4, 1.7,1.8,1.9,2.1,1.8

If we are using all of the values of X to find its mean, variance, standard deviation, skewness
etc and other statistical parameters then X is termed as the population.

(before the mids, we have done this thing!)

But imagine a situation in which, by any reason, you are not taking all the values of X, rather
only a few values of X, say a subset of X; e.g. you take randomly X1=1.2, 1.3, 1.5, 1.2.

This subset X1 is what we call a sample of X. Further more, if we want to generalize the
results of statistical analysis that we perform on X1 to X, i.e. want to find about population
(X) from sample (X1) then this process is called statistical inference

This is the research process and is the topic of discussion after the mids.
Lecture Prepared by Ikram-E-Khuda 5
Research Process

Ref:https://towardsdatascience.com/what-is-the-difference-between-population-and-sample-e13d17746b16
Lecture Prepared by Ikram-E-Khuda 6
Naming Conventions
• Usually for population we used Greek letters as symbols but for samples we use
English letter symbols in indicating statistical parameters.

• If the same symbol is being used then in the subscripts we must specify whether
it is for population or sample.

• The table below shows some examples.


Statistical Parameter Population Sample

Mean
Variance
Standard Deviation
Correlation
Correlation
Regression
Regression Coefficient
Coefficient
Skewness
Skewness

Lecture Prepared by Ikram-E-Khuda 7


Calculation Methods
• It is to be noted that there is difference in some calculation
methods for population and sample.

• Lets consider the mean, variance and standard deviation .


Statistical Population Calculation Sample Calculation
Parameter
Mean
Mean
Variance
Variance

Standard
Standard
Deviation
Deviation

Remarks Here is population Here is sample size


size
Remarks

Lecture Prepared by Ikram-E-Khuda 8


Calculation Methods
• The
  denominator and in sample formulae are
called the degree of freedom or

• It is important to understand the concept of . This


is confronted many times in statistical inference.

• It is the number of values that are free to change


or vary in a calculation.

Lecture Prepared by Ikram-E-Khuda 9


Degree of Freedom
•• In
  statists we follow simple rule.

• Degree of freedom in the calculation of a statistical


parameter is equal to N-(total number of other estimates
required to calculate that statistical parameter).

• Since sample mean calculation requires no other estimate


calculation therefore its =N

• However sample variance calculation requires mean


estimate , therefore its DF=N-1.
Lecture Prepared by Ikram-E-Khuda 10
Degree of Freedom
•• Note:
 
Using the idea of DF, statistical formulae can be summarized. For
example variance formula can be summarized as:

The above notation of variance in terms of SS and DF is very


commonly used in ANOVA, Correlation and Regression (topics to
study later).
Lecture Prepared by Ikram-E-Khuda 11
Sampling Distribution
• A
  sampling distribution is a frequency or
probability distribution of a particular statistical
parameter in every possible sample of size
taken from the population

• If the sampling distribution is made to show


the probability distribution of means of all the
possible samples in a population then it is
called the sampling distribution of means.
Lecture Prepared by Ikram-E-Khuda 12
Sampling Distribution of Mean
• In this case the dataset variable or random variable is Mean and it contains
• all  the values which are the means of all possible samples from the
population.

• Suppose there is a population and out of this population 36 samples of size


can be taken. It means that mean of all those 36 samples are taken as values
of a random variable (). We call this random variable as sampling
distribution of means. This can be for example written as:

, , , ……………….,

We can further formulate and summarize the sampling distribution in terms of


its frequency distribution of by considering grouped statistics showing the
frequencies of values present in .
Lecture Prepared by Ikram-E-Khuda 13
Central Limit Theorem (CLT)
•  
This is one of the many important theorems in statistical inference.

CLT can be understood as follows.

1) The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently
large random samples from the population with replacement, then the distribution of the sample means will be
approximately normally distributed.

 In order to follow the assumption that sampling distribution is normal, sample size must be very large( at least greater than 30). The
larger the better

2) Expected value of the sampling distribution of mean of this normal distribution is the population mean ()

3) Standard deviation of this normal distribution is called the standard error (). For sampling distribution of mean, it is called
standard error of mean ()

Every standard error has their own equation or formula depending upon which statistics they define. for samples of size and
population standard deviation of is given as follows:

  => standard deviation of sampling distribution of means


Lecture Prepared by Ikram-E-Khuda 14
Z- Scores
• This
  means that the usual z score formula given
for a normal random variable X which was given
as,

• Will change for the sampling distribution of mean.


It will now be given as-

Lecture Prepared by Ikram-E-Khuda 15


Z Distribution vs. T Distribution
• According to above, to follow CLT, sample size must be very large or we
•  
should have all the possible sample from the population. And for both
situations we should know population standard deviation.

• If sample size are not very large (less than 30) and/ or population standard
deviation is not known then the frequency distribution of the sampling
distribution of mean follows an approximate normal distribution ; which is
called t distribution.

• The for a t distribution, using sample variance s is given as

  𝑠
𝑆𝐸𝑀 =
√𝑁

Lecture Prepared by Ikram-E-Khuda 16


T- Score
• We
  now obtain t statistic not a z statistic, which is given
as, and is approximately normally distributed with a

  𝑥−𝜇
𝑡=
𝑠
√𝑁

• The for sampling mean distribution is


 

Lecture Prepared by Ikram-E-Khuda 17


Distributions and
  t Distributions

• We
  use table to solve normal distribution
problems
• We use t table to solve t distribution problems
Lecture Prepared by Ikram-E-Khuda 18
Lecture Prepared by Ikram-E-Khuda 19
Lecture Prepared by Ikram-E-Khuda 20
CLT Example

Lecture Prepared by Ikram-E-Khuda 21


Example

 
Population Mean=(1+2+3+4+5+6)/6 = 3.5
Population Variance==2.92

 i.e. Population Standard Deviation is:

Lecture Prepared by Ikram-E-Khuda 22


Sampling Distribution of Means

Lecture Prepared by Ikram-E-Khuda 23


Frequency Distribution of Sampling
Distribution of Mean
Random Variable ( Frequency Probability
1.0 1
1.0
1.5 1
2
1.5
2.0 2
3
2.0
2.5 3
4
2.5
3.0 4
5
3.0
3.5 5
6
3.5
4.0 6
5
4.0
4.5 5
4
4.5
5.0 4
3
5.0
5.5 3
2
5.5
6.0 2
1
Total 6.0 1
36 1
Total 36
Lecture Prepared by Ikram-E-Khuda
1 24
Prove of CLT
1) Frequency distribution of sampling distribution of mean is normal distribution

  2) Expected value of =

 
==+2.0++3.0+++
+++

Lecture Prepared by Ikram-E-Khuda 25


Prove of CLT
 
3)

  Variance of Sampling Distribution of Means ()

 
==++++++
+++

  On further continuing with the calculation and taking square root of ,

Lecture Prepared by Ikram-E-Khuda 26


Estimation of Mean and
Confidence Intervals
Business Statistics/ Statistical
Inference

Lecture Prepared by Ikram-E-Khuda 27


Example

 
Population Mean=(1+2+3+4+5+6)/6 = 3.5
Population Variance==2.92

 i.e. Population Standard Deviation is:

Lecture Prepared by Ikram-E-Khuda 28


Example

Can we estimate for this population mean using a


random sample from the sampling distribution of
means?

Lecture Prepared by Ikram-E-Khuda 29


Sampling Distribution of Means

Lecture Prepared by Ikram-E-Khuda 30


Population Mean Estimation
Earlier we looked how z scores and t scores are written. They can be rearranged as
follows to estimate for population mean.

 
=> =

 
=> =

Correct estimation very much depends on the values of z or t that we take. Z score is used
when population standard deviation is known and t score is used when population standard
deviation is not known, so only we have the option to make use of sample standard deviation.

Estimations are always made in intervals. They are called interval estimations.
Lecture Prepared by Ikram-E-Khuda 31
Population Mean Estimation
• Suppose
  we have a sample from this population
with

• Now in this example we have the population


standard deviation known to us which gives

• Therefore using above equation population mean


is estimated as follows
Lecture Prepared by Ikram-E-Khuda 32
Population Mean Estimation
=•  
=

So the question is that what value of z we take?

The answer to this question depends that how much confidence


or surety about the area of normal distribution we have that our
population mean lies in there? This confidence or surety is called
our confidence interval. Whether we have 90% or 95% or 99%
confident ? It is obvious that the more confidence interval (CI)
we have , the more correct will be the estimation

Lecture Prepared by Ikram-E-Khuda 33


Population Mean Estimation
•• Since
  we already know the has a value 3.5, therefore we can verify
whether our estimation(s) are correct or incorrect.

• Lets start with a 68% confidence interval (CI). For 68% CI, we have z=±1.

• Hence estimation equation becomes:


= => this gives two limits of , which are
=
= i.e. the interval of our estimation of comes out to be as follows:

Is this interval estimation of correct? ………NO because we know that our


Lecture Prepared by Ikram-E-Khuda 34
Population Mean Estimation
•• Lets start changing the z value or the CI%. Suppose we now estimate
 
at 95% CI.

• The using the same procedure, we get


= => this gives two limits of , which are
=
= i.e. the interval of our estimation of comes out to be as follows:

Is this interval estimation of correct? ………yes! because we know that


our

Lecture Prepared by Ikram-E-Khuda 35


Population Mean Estimation
• The more the CI% we have the more will be
the area covered and hence the more will be
the probability of estimating the population
mean.

• % value of CI for example 95% CI also tells us


that out of 100 samples, 95 will have a mean
in the obtained range.

Lecture Prepared by Ikram-E-Khuda 36


Note:
A confidence interval for the mean is a
range of scores constructed such that the
population mean will fall within this
range in 95% of samples.

The confidence interval is not an interval


within which we are 95% confident that
the population mean will fall.

Lecture Prepared by Ikram-E-Khuda 37


Practice Examples

Business Statistics/ Statistical


Inference

Lecture prepared by Ikram-E-Khuda


Example Problems

Lecture prepared by Ikram-E-Khuda


Problem 1
You are the Director of Transportation Safety. You are
concerned because the average highway speed of all trucks
may exceed the 60 miles per hour speed limit. A random
sample of 120 trucks show a mean speed of 62 mph.
Assuming that the population mean is 60 mph and population
standard deviation is 12.5 mph, find the probability that the
average speed is greater than or equal to 62 mph.

Lecture prepared by Ikram-E-Khuda


Solution
•  
Solution

• We are provided with the sample information having , sample mean (speed)
mph, population mean mph and population standard deviation mph

• Using this information it is required to find the probability that the average
speed is greater than or equal to 62 mph. i.e.

• Since probability or area is required so we need to first calculate z value and


then by using z table we will find the required probability.

• Using the information provided it shows that we are given a sample mean
which must have been taken from sampling distribution of mean and there
fore we have to use following z formula to solve for z,
Lecture prepared by Ikram-E-Khuda
Graph of the Problem
Sampling
Distribution of
Mean
Required Area or
Probability

 𝜇=60 𝑥=62
Sample Mean Values

Lecture prepared by Ikram-E-Khuda


Solution
•  Converting into z gives:

Lecture prepared by Ikram-E-Khuda


Graph of the Problem
Standard Sampling
Distribution of
Mean
Required Area or
Probability

 𝜇=0  𝑧=1.75
  Standard Values

Lecture prepared by Ikram-E-Khuda


Solution
• Since we cannot directly find ≥ probabilities
from our z table therefore this problem will be
solved as follows
 
𝑃 ( 𝑥 ≥ 62 )=𝑃 ( 𝑧 ≥ 1.75 ) =1− P( z ≤ 1.75)

• Looking into the z table we get:

Lecture prepared by Ikram-E-Khuda


Solution

 𝑃 ( 𝑥 ≥ 62 )=𝑃 ( 𝑧 ≥ 1.75 ) =1− P ( z ≤ 1.75 )=1 −0.9599=0.0401

Lecture prepared by Ikram-E-Khuda


Answer

Answer

4.01% of the time, a random sample of 120

trucks from the population will yield a mean

speed of 62 mph or more.

Lecture prepared by Ikram-E-Khuda


Problem 2
A study involving stress is done on a college campus among the students. The stress

scores follow a uniform distribution with the lowest stress score equal to 1 and highest

equal to 5. The population mean is 3 and population standard deviation is 1.154.

Using a sample of 75 students, find:

a) The probability that mean stress score of the sample of 75 students is less than 2

b) The 90th percentile for the mean stress score for a sample of 75 students

Lecture prepared by Ikram-E-Khuda


Solution
•  
Solution (a)

• We are provided with the sample information having sample size .

• Using this information it is required to find the probability that the average or
mean stress score is lesser than or equal to 2. i.e.

• Since probability or area is required so we need to first calculate z value and


then by using z table we will find the required probability.

• Using the information provided it shows that we are given a target sample
mean which must have been taken from sampling distribution of mean and
therefore we have to use following z by
Lecture prepared formula to solve for z.
Ikram-E-Khuda
Graph of the Problem
Sampling
Distribution of
Mean

Required Area or
Probability

 𝑥=2  𝜇=3
Sample Mean Values

Lecture prepared by Ikram-E-Khuda


Solution
•  Converting into z gives:

Lecture prepared by Ikram-E-Khuda


Graph of the Problem
Standard Sampling
Distribution of
Mean

Required Area or
Probability

 𝜇=𝑧 =0
𝑧=−
  7.504
  Standard Values

Lecture prepared by Ikram-E-Khuda


Solution
• Although
  we cannot find from our z table but
we can see from the table that the more
negative a z value is, the lesser will be its area.
Therefore,
 𝑃 ( 𝑥 ≤ 2 )=𝑃 ( 𝑧 ≤ −7.504 ) ≈ 0

• Looking into the z table we get:

Lecture prepared by Ikram-E-Khuda


Solution

Answer (a)

The probability of the mean stress score to be less than 2 is about 0.

Lecture prepared by Ikram-E-Khuda


Solution

Solution (b)

For 90th percentile, we take one tailed area from

left hand side. The z value used to give this area

is as shown in the following graph and z table:

Lecture prepared by Ikram-E-Khuda


Solution

Lecture prepared by Ikram-E-Khuda


Solution
•For  90th percentile the z value is
Hence corresponding sample mean or for 90th
percentile is calculated as follows:

Lecture prepared by Ikram-E-Khuda


Answer
•  
Answer (b)

The 90th percentile of the sample means of stress

scores () is about 3.17. This tells us that 90% of all

means of samples () are atmost 3.17 and 10% are at

least 3.17
Lecture prepared by Ikram-E-Khuda
Problem 3
Suppose that a market research analyst for a cell phone company conducts a study of their
customers who exceeds the time allowance included on their basic cell phone contract. The
analyst finds that for those people who exceeded the time included in their basic contract, the
excess time used follows an exponential distribution with a mean of 22 minutes. In a negative
exponential distribution mean and standard deviation are same. Consider a random sample of
80 customers who have exceeded the time allowance included in their basic cell phone contract.

a) Find the probability that the mean excess time used by the 80 customers in the sample is
longer than 20 minutes.

b) Find the 95th percentile of the sample mean excess time for the samples of 80 customers
who exceed their basic contract time.

Lecture prepared by Ikram-E-Khuda


Solution
•  
Solution (a)

• We are provided with the sample information having sample size .

• Using this information it is required to find the probability that the average
or mean excess time is greater than or equal to 20 minutes. i.e.

• Since probability or area is required so we need to first calculate z value and


then by using z table we will find the required probability.

• Using the information provided it shows that are given a target sample mean
which must have been taken from sampling distribution of mean and
therefore we have to use following z formula to solve for z.
Lecture prepared by Ikram-E-Khuda
Solution
•• Since
  it is given in the question that the excess time used
follows an exponential distribution with a mean of 22
minutes, therefore the population is exponentially
distributed with a population mean =22.

• It should be noted that in a negative exponential


distribution, mean and standard deviation is same, i.e. in
our case we have

=.

Lecture prepared by Ikram-E-Khuda


Graph of the Problem
Sampling
Distribution of
Mean

Required Area or
Probability

 𝑥=20 𝜇=22
Sample Mean Values

Lecture prepared by Ikram-E-Khuda


Solution
•  Converting into z gives:

Lecture prepared by Ikram-E-Khuda


Graph of the Problem
Standard Sampling
Distribution of
Mean

Required Area or
Probability

𝑧=−
  0.8131  𝜇=0
  Standard Values

Lecture prepared by Ikram-E-Khuda


Solution
• Since we cannot directly find ≥ probabilities
from our z table therefore this problem will be
solved as follows
𝑃 ( 𝑥 ≥ 20 )=𝑃 ( 𝑧 ≥− 0.8131 )=1 − P( z ≤− 0.8131)
 

• Looking into the z table we get:

Lecture prepared by Ikram-E-Khuda


Solution

𝑃
  ( 𝑥 ≥ 20 )=𝑃 ( 𝑧 ≥− 0.8131 )=1 − P ( z ≤ −0.8131 )=1 −0.2090=0.791
Lecture prepared by Ikram-E-Khuda
Answer

Answer (a)
The probability is 0.791 or 79.1% that the mean
excess time is more than 20 minutes for a
sample of 80 customers who exceed their
contracted time allowance.

Lecture prepared by Ikram-E-Khuda


Solution

b) Solution part b

For 95th percentile, we take one tailed area from

left hand side. The z value used to give this area

is as shown in the following graph and z table:

Lecture prepared by Ikram-E-Khuda


Solution

Lecture prepared by Ikram-E-Khuda


Solution
•For  95th percentile the z value is

Hence corresponding sample mean or for 95th


percentile is calculated as follows:

Lecture prepared by Ikram-E-Khuda


Answer
•  
Answer (b)

The 95th percentile of the sample means of excess time

() is about 26.058. This tells us that 95% of all means

of samples () are at most 26.058 and 5% are at least

26.058
Lecture prepared by Ikram-E-Khuda
Problem 4
a) Suppose scores on exams in Statistics are normally distributed with an

unknown population mean and a population standard deviation of 3 points.

A random sample of scores with sample size equal to 36 gives a sample

mean of 68. Find a confidence interval ( CI ) estimate for the population

mean exam score for 95% CI.

b) Solve (a) but this time instead of population standard deviation, it is sample

standard deviation of 3 points.


Lecture prepared by Ikram-E-Khuda
Solution
Solution (a)
• This is an estimation problem.

• Population mean is to be estimated.

• Since population standard deviation is known, hence we use


following formula for population estimation
 
=> =

Lecture prepared by Ikram-E-Khuda


Solution
•=>  =
Here sample mean 68
Population standard deviation
Sample size
For 95% CI, and interval estimation

Substituting all the above values in the estimation equation


gives
=
i.e. lower limit estimate of =
And upper limit estimate of =

Lecture prepared by Ikram-E-Khuda


Solution
•Answer
  (a)
The interval estimate of population mean at 95%
CI is as follows:

Lecture prepared by Ikram-E-Khuda


Solution
Solution (b)
• This is an estimation problem.

• Population mean is to be estimated.

• Since population standard deviation is not known but sample


standard deviation is known, hence we use following formula for
population estimation

 
=> =

Lecture prepared by Ikram-E-Khuda


Solution
  => => =

•  
Here sample mean 68
sample standard deviation
Sample size
For 95% CI, and interval estimation
Substituting all the above values in the estimation equation
gives
=
i.e. lower limit estimate of =
And upper limit estimate of =

Lecture prepared by Ikram-E-Khuda


Solution
•Answer
  (a)
The interval estimate of population mean at 95%
CI is as follows:

Lecture prepared by Ikram-E-Khuda


The t value reference

Lecture prepared by Ikram-E-Khuda


Answer 5

Therefore 62 students should be surveyed in


order to be 95% confident that we are within 2
years of the true population mean age of
University students.

Lecture prepared by Ikram-E-Khuda


Lecture prepared by Ikram-E-Khuda

You might also like