Population Sampling, DF, Sampling
Distribution of Means, CLT, Estimation of
Mean and Confidence Interval (CI)
Business Statistics/ Statistical
Inference
Lecture Prepared by Ikram-E-Khuda 1
Recap of Previous Lecture
For 68% Area
68% 68% 68%
z=0.47 z=-1 z=1
z=-0.47
For 95% Area
95% 95% 95%
z=1.65 z=-1.65 z=-1.96 z=1.96
For 99.7% Area
99.7% 99.7%
99.7%
Lecture Prepared by Ikram-E-Khuda 2
z=2.75 z=-2.75 z=-2.97 z=2.97
Empirical Rule
Lecture Prepared by Ikram-E-Khuda 3
Population Sampling
• Population sampling is the process of taking a subset of
subjects that is representative of the entire population.
• The sample must have sufficient size to warrant statistical
analysis.
• Sampling is done usually because it is impossible to test
every single individual in the population.
• It is also done to save time, money and effort while
conducting the research.
Lecture Prepared by Ikram-E-Khuda
Ref: https://explorable.com/population-sampling 4
Sampling Analytics
• Suppose we have a dataset or random variable as;
X=1.2, 1.3,1.1, 1.4,1.2, 1.3, 1.5, 1.6, 1.4, 1.7,1.8,1.9,2.1,1.8
If we are using all of the values of X to find its mean, variance, standard deviation, skewness
etc and other statistical parameters then X is termed as the population.
(before the mids, we have done this thing!)
But imagine a situation in which, by any reason, you are not taking all the values of X, rather
only a few values of X, say a subset of X; e.g. you take randomly X1=1.2, 1.3, 1.5, 1.2.
This subset X1 is what we call a sample of X. Further more, if we want to generalize the
results of statistical analysis that we perform on X1 to X, i.e. want to find about population
(X) from sample (X1) then this process is called statistical inference
This is the research process and is the topic of discussion after the mids.
Lecture Prepared by Ikram-E-Khuda 5
Research Process
Ref:https://towardsdatascience.com/what-is-the-difference-between-population-and-sample-e13d17746b16
Lecture Prepared by Ikram-E-Khuda 6
Naming Conventions
• Usually for population we used Greek letters as symbols but for samples we use
English letter symbols in indicating statistical parameters.
• If the same symbol is being used then in the subscripts we must specify whether
it is for population or sample.
• The table below shows some examples.
Statistical Parameter Population Sample
Mean
Variance
Standard Deviation
Correlation
Correlation
Regression
Regression Coefficient
Coefficient
Skewness
Skewness
Lecture Prepared by Ikram-E-Khuda 7
Calculation Methods
• It is to be noted that there is difference in some calculation
methods for population and sample.
• Lets consider the mean, variance and standard deviation .
Statistical Population Calculation Sample Calculation
Parameter
Mean
Mean
Variance
Variance
Standard
Standard
Deviation
Deviation
Remarks Here is population Here is sample size
size
Remarks
Lecture Prepared by Ikram-E-Khuda 8
Calculation Methods
• The
denominator and in sample formulae are
called the degree of freedom or
• It is important to understand the concept of . This
is confronted many times in statistical inference.
• It is the number of values that are free to change
or vary in a calculation.
Lecture Prepared by Ikram-E-Khuda 9
Degree of Freedom
•• In
statists we follow simple rule.
• Degree of freedom in the calculation of a statistical
parameter is equal to N-(total number of other estimates
required to calculate that statistical parameter).
• Since sample mean calculation requires no other estimate
calculation therefore its =N
• However sample variance calculation requires mean
estimate , therefore its DF=N-1.
Lecture Prepared by Ikram-E-Khuda 10
Degree of Freedom
•• Note:
Using the idea of DF, statistical formulae can be summarized. For
example variance formula can be summarized as:
The above notation of variance in terms of SS and DF is very
commonly used in ANOVA, Correlation and Regression (topics to
study later).
Lecture Prepared by Ikram-E-Khuda 11
Sampling Distribution
• A
sampling distribution is a frequency or
probability distribution of a particular statistical
parameter in every possible sample of size
taken from the population
• If the sampling distribution is made to show
the probability distribution of means of all the
possible samples in a population then it is
called the sampling distribution of means.
Lecture Prepared by Ikram-E-Khuda 12
Sampling Distribution of Mean
• In this case the dataset variable or random variable is Mean and it contains
• all the values which are the means of all possible samples from the
population.
• Suppose there is a population and out of this population 36 samples of size
can be taken. It means that mean of all those 36 samples are taken as values
of a random variable (). We call this random variable as sampling
distribution of means. This can be for example written as:
, , , ……………….,
We can further formulate and summarize the sampling distribution in terms of
its frequency distribution of by considering grouped statistics showing the
frequencies of values present in .
Lecture Prepared by Ikram-E-Khuda 13
Central Limit Theorem (CLT)
•
This is one of the many important theorems in statistical inference.
CLT can be understood as follows.
1) The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently
large random samples from the population with replacement, then the distribution of the sample means will be
approximately normally distributed.
In order to follow the assumption that sampling distribution is normal, sample size must be very large( at least greater than 30). The
larger the better
2) Expected value of the sampling distribution of mean of this normal distribution is the population mean ()
3) Standard deviation of this normal distribution is called the standard error (). For sampling distribution of mean, it is called
standard error of mean ()
Every standard error has their own equation or formula depending upon which statistics they define. for samples of size and
population standard deviation of is given as follows:
=> standard deviation of sampling distribution of means
Lecture Prepared by Ikram-E-Khuda 14
Z- Scores
• This
means that the usual z score formula given
for a normal random variable X which was given
as,
• Will change for the sampling distribution of mean.
It will now be given as-
Lecture Prepared by Ikram-E-Khuda 15
Z Distribution vs. T Distribution
• According to above, to follow CLT, sample size must be very large or we
•
should have all the possible sample from the population. And for both
situations we should know population standard deviation.
• If sample size are not very large (less than 30) and/ or population standard
deviation is not known then the frequency distribution of the sampling
distribution of mean follows an approximate normal distribution ; which is
called t distribution.
• The for a t distribution, using sample variance s is given as
𝑠
𝑆𝐸𝑀 =
√𝑁
Lecture Prepared by Ikram-E-Khuda 16
T- Score
• We
now obtain t statistic not a z statistic, which is given
as, and is approximately normally distributed with a
𝑥−𝜇
𝑡=
𝑠
√𝑁
• The for sampling mean distribution is
Lecture Prepared by Ikram-E-Khuda 17
Distributions and
t Distributions
• We
use table to solve normal distribution
problems
• We use t table to solve t distribution problems
Lecture Prepared by Ikram-E-Khuda 18
Lecture Prepared by Ikram-E-Khuda 19
Lecture Prepared by Ikram-E-Khuda 20
CLT Example
Lecture Prepared by Ikram-E-Khuda 21
Example
Population Mean=(1+2+3+4+5+6)/6 = 3.5
Population Variance==2.92
i.e. Population Standard Deviation is:
Lecture Prepared by Ikram-E-Khuda 22
Sampling Distribution of Means
Lecture Prepared by Ikram-E-Khuda 23
Frequency Distribution of Sampling
Distribution of Mean
Random Variable ( Frequency Probability
1.0 1
1.0
1.5 1
2
1.5
2.0 2
3
2.0
2.5 3
4
2.5
3.0 4
5
3.0
3.5 5
6
3.5
4.0 6
5
4.0
4.5 5
4
4.5
5.0 4
3
5.0
5.5 3
2
5.5
6.0 2
1
Total 6.0 1
36 1
Total 36
Lecture Prepared by Ikram-E-Khuda
1 24
Prove of CLT
1) Frequency distribution of sampling distribution of mean is normal distribution
2) Expected value of =
==+2.0++3.0+++
+++
Lecture Prepared by Ikram-E-Khuda 25
Prove of CLT
3)
Variance of Sampling Distribution of Means ()
==++++++
+++
On further continuing with the calculation and taking square root of ,
Lecture Prepared by Ikram-E-Khuda 26
Estimation of Mean and
Confidence Intervals
Business Statistics/ Statistical
Inference
Lecture Prepared by Ikram-E-Khuda 27
Example
Population Mean=(1+2+3+4+5+6)/6 = 3.5
Population Variance==2.92
i.e. Population Standard Deviation is:
Lecture Prepared by Ikram-E-Khuda 28
Example
Can we estimate for this population mean using a
random sample from the sampling distribution of
means?
Lecture Prepared by Ikram-E-Khuda 29
Sampling Distribution of Means
Lecture Prepared by Ikram-E-Khuda 30
Population Mean Estimation
Earlier we looked how z scores and t scores are written. They can be rearranged as
follows to estimate for population mean.
=> =
=> =
Correct estimation very much depends on the values of z or t that we take. Z score is used
when population standard deviation is known and t score is used when population standard
deviation is not known, so only we have the option to make use of sample standard deviation.
Estimations are always made in intervals. They are called interval estimations.
Lecture Prepared by Ikram-E-Khuda 31
Population Mean Estimation
• Suppose
we have a sample from this population
with
• Now in this example we have the population
standard deviation known to us which gives
• Therefore using above equation population mean
is estimated as follows
Lecture Prepared by Ikram-E-Khuda 32
Population Mean Estimation
=•
=
So the question is that what value of z we take?
The answer to this question depends that how much confidence
or surety about the area of normal distribution we have that our
population mean lies in there? This confidence or surety is called
our confidence interval. Whether we have 90% or 95% or 99%
confident ? It is obvious that the more confidence interval (CI)
we have , the more correct will be the estimation
Lecture Prepared by Ikram-E-Khuda 33
Population Mean Estimation
•• Since
we already know the has a value 3.5, therefore we can verify
whether our estimation(s) are correct or incorrect.
• Lets start with a 68% confidence interval (CI). For 68% CI, we have z=±1.
• Hence estimation equation becomes:
= => this gives two limits of , which are
=
= i.e. the interval of our estimation of comes out to be as follows:
Is this interval estimation of correct? ………NO because we know that our
Lecture Prepared by Ikram-E-Khuda 34
Population Mean Estimation
•• Lets start changing the z value or the CI%. Suppose we now estimate
at 95% CI.
• The using the same procedure, we get
= => this gives two limits of , which are
=
= i.e. the interval of our estimation of comes out to be as follows:
Is this interval estimation of correct? ………yes! because we know that
our
Lecture Prepared by Ikram-E-Khuda 35
Population Mean Estimation
• The more the CI% we have the more will be
the area covered and hence the more will be
the probability of estimating the population
mean.
• % value of CI for example 95% CI also tells us
that out of 100 samples, 95 will have a mean
in the obtained range.
Lecture Prepared by Ikram-E-Khuda 36
Note:
A confidence interval for the mean is a
range of scores constructed such that the
population mean will fall within this
range in 95% of samples.
The confidence interval is not an interval
within which we are 95% confident that
the population mean will fall.
Lecture Prepared by Ikram-E-Khuda 37
Practice Examples
Business Statistics/ Statistical
Inference
Lecture prepared by Ikram-E-Khuda
Example Problems
Lecture prepared by Ikram-E-Khuda
Problem 1
You are the Director of Transportation Safety. You are
concerned because the average highway speed of all trucks
may exceed the 60 miles per hour speed limit. A random
sample of 120 trucks show a mean speed of 62 mph.
Assuming that the population mean is 60 mph and population
standard deviation is 12.5 mph, find the probability that the
average speed is greater than or equal to 62 mph.
Lecture prepared by Ikram-E-Khuda
Solution
•
Solution
• We are provided with the sample information having , sample mean (speed)
mph, population mean mph and population standard deviation mph
• Using this information it is required to find the probability that the average
speed is greater than or equal to 62 mph. i.e.
• Since probability or area is required so we need to first calculate z value and
then by using z table we will find the required probability.
• Using the information provided it shows that we are given a sample mean
which must have been taken from sampling distribution of mean and there
fore we have to use following z formula to solve for z,
Lecture prepared by Ikram-E-Khuda
Graph of the Problem
Sampling
Distribution of
Mean
Required Area or
Probability
𝜇=60 𝑥=62
Sample Mean Values
Lecture prepared by Ikram-E-Khuda
Solution
• Converting into z gives:
Lecture prepared by Ikram-E-Khuda
Graph of the Problem
Standard Sampling
Distribution of
Mean
Required Area or
Probability
𝜇=0 𝑧=1.75
Standard Values
Lecture prepared by Ikram-E-Khuda
Solution
• Since we cannot directly find ≥ probabilities
from our z table therefore this problem will be
solved as follows
𝑃 ( 𝑥 ≥ 62 )=𝑃 ( 𝑧 ≥ 1.75 ) =1− P( z ≤ 1.75)
• Looking into the z table we get:
Lecture prepared by Ikram-E-Khuda
Solution
𝑃 ( 𝑥 ≥ 62 )=𝑃 ( 𝑧 ≥ 1.75 ) =1− P ( z ≤ 1.75 )=1 −0.9599=0.0401
Lecture prepared by Ikram-E-Khuda
Answer
Answer
4.01% of the time, a random sample of 120
trucks from the population will yield a mean
speed of 62 mph or more.
Lecture prepared by Ikram-E-Khuda
Problem 2
A study involving stress is done on a college campus among the students. The stress
scores follow a uniform distribution with the lowest stress score equal to 1 and highest
equal to 5. The population mean is 3 and population standard deviation is 1.154.
Using a sample of 75 students, find:
a) The probability that mean stress score of the sample of 75 students is less than 2
b) The 90th percentile for the mean stress score for a sample of 75 students
Lecture prepared by Ikram-E-Khuda
Solution
•
Solution (a)
• We are provided with the sample information having sample size .
• Using this information it is required to find the probability that the average or
mean stress score is lesser than or equal to 2. i.e.
• Since probability or area is required so we need to first calculate z value and
then by using z table we will find the required probability.
• Using the information provided it shows that we are given a target sample
mean which must have been taken from sampling distribution of mean and
therefore we have to use following z by
Lecture prepared formula to solve for z.
Ikram-E-Khuda
Graph of the Problem
Sampling
Distribution of
Mean
Required Area or
Probability
𝑥=2 𝜇=3
Sample Mean Values
Lecture prepared by Ikram-E-Khuda
Solution
• Converting into z gives:
Lecture prepared by Ikram-E-Khuda
Graph of the Problem
Standard Sampling
Distribution of
Mean
Required Area or
Probability
𝜇=𝑧 =0
𝑧=−
7.504
Standard Values
Lecture prepared by Ikram-E-Khuda
Solution
• Although
we cannot find from our z table but
we can see from the table that the more
negative a z value is, the lesser will be its area.
Therefore,
𝑃 ( 𝑥 ≤ 2 )=𝑃 ( 𝑧 ≤ −7.504 ) ≈ 0
• Looking into the z table we get:
Lecture prepared by Ikram-E-Khuda
Solution
Answer (a)
The probability of the mean stress score to be less than 2 is about 0.
Lecture prepared by Ikram-E-Khuda
Solution
Solution (b)
For 90th percentile, we take one tailed area from
left hand side. The z value used to give this area
is as shown in the following graph and z table:
Lecture prepared by Ikram-E-Khuda
Solution
Lecture prepared by Ikram-E-Khuda
Solution
•For 90th percentile the z value is
Hence corresponding sample mean or for 90th
percentile is calculated as follows:
Lecture prepared by Ikram-E-Khuda
Answer
•
Answer (b)
The 90th percentile of the sample means of stress
scores () is about 3.17. This tells us that 90% of all
means of samples () are atmost 3.17 and 10% are at
least 3.17
Lecture prepared by Ikram-E-Khuda
Problem 3
Suppose that a market research analyst for a cell phone company conducts a study of their
customers who exceeds the time allowance included on their basic cell phone contract. The
analyst finds that for those people who exceeded the time included in their basic contract, the
excess time used follows an exponential distribution with a mean of 22 minutes. In a negative
exponential distribution mean and standard deviation are same. Consider a random sample of
80 customers who have exceeded the time allowance included in their basic cell phone contract.
a) Find the probability that the mean excess time used by the 80 customers in the sample is
longer than 20 minutes.
b) Find the 95th percentile of the sample mean excess time for the samples of 80 customers
who exceed their basic contract time.
Lecture prepared by Ikram-E-Khuda
Solution
•
Solution (a)
• We are provided with the sample information having sample size .
• Using this information it is required to find the probability that the average
or mean excess time is greater than or equal to 20 minutes. i.e.
• Since probability or area is required so we need to first calculate z value and
then by using z table we will find the required probability.
• Using the information provided it shows that are given a target sample mean
which must have been taken from sampling distribution of mean and
therefore we have to use following z formula to solve for z.
Lecture prepared by Ikram-E-Khuda
Solution
•• Since
it is given in the question that the excess time used
follows an exponential distribution with a mean of 22
minutes, therefore the population is exponentially
distributed with a population mean =22.
• It should be noted that in a negative exponential
distribution, mean and standard deviation is same, i.e. in
our case we have
=.
Lecture prepared by Ikram-E-Khuda
Graph of the Problem
Sampling
Distribution of
Mean
Required Area or
Probability
𝑥=20 𝜇=22
Sample Mean Values
Lecture prepared by Ikram-E-Khuda
Solution
• Converting into z gives:
Lecture prepared by Ikram-E-Khuda
Graph of the Problem
Standard Sampling
Distribution of
Mean
Required Area or
Probability
𝑧=−
0.8131 𝜇=0
Standard Values
Lecture prepared by Ikram-E-Khuda
Solution
• Since we cannot directly find ≥ probabilities
from our z table therefore this problem will be
solved as follows
𝑃 ( 𝑥 ≥ 20 )=𝑃 ( 𝑧 ≥− 0.8131 )=1 − P( z ≤− 0.8131)
• Looking into the z table we get:
Lecture prepared by Ikram-E-Khuda
Solution
𝑃
( 𝑥 ≥ 20 )=𝑃 ( 𝑧 ≥− 0.8131 )=1 − P ( z ≤ −0.8131 )=1 −0.2090=0.791
Lecture prepared by Ikram-E-Khuda
Answer
Answer (a)
The probability is 0.791 or 79.1% that the mean
excess time is more than 20 minutes for a
sample of 80 customers who exceed their
contracted time allowance.
Lecture prepared by Ikram-E-Khuda
Solution
b) Solution part b
For 95th percentile, we take one tailed area from
left hand side. The z value used to give this area
is as shown in the following graph and z table:
Lecture prepared by Ikram-E-Khuda
Solution
Lecture prepared by Ikram-E-Khuda
Solution
•For 95th percentile the z value is
Hence corresponding sample mean or for 95th
percentile is calculated as follows:
Lecture prepared by Ikram-E-Khuda
Answer
•
Answer (b)
The 95th percentile of the sample means of excess time
() is about 26.058. This tells us that 95% of all means
of samples () are at most 26.058 and 5% are at least
26.058
Lecture prepared by Ikram-E-Khuda
Problem 4
a) Suppose scores on exams in Statistics are normally distributed with an
unknown population mean and a population standard deviation of 3 points.
A random sample of scores with sample size equal to 36 gives a sample
mean of 68. Find a confidence interval ( CI ) estimate for the population
mean exam score for 95% CI.
b) Solve (a) but this time instead of population standard deviation, it is sample
standard deviation of 3 points.
Lecture prepared by Ikram-E-Khuda
Solution
Solution (a)
• This is an estimation problem.
• Population mean is to be estimated.
• Since population standard deviation is known, hence we use
following formula for population estimation
=> =
Lecture prepared by Ikram-E-Khuda
Solution
•=> =
Here sample mean 68
Population standard deviation
Sample size
For 95% CI, and interval estimation
Substituting all the above values in the estimation equation
gives
=
i.e. lower limit estimate of =
And upper limit estimate of =
Lecture prepared by Ikram-E-Khuda
Solution
•Answer
(a)
The interval estimate of population mean at 95%
CI is as follows:
Lecture prepared by Ikram-E-Khuda
Solution
Solution (b)
• This is an estimation problem.
• Population mean is to be estimated.
• Since population standard deviation is not known but sample
standard deviation is known, hence we use following formula for
population estimation
=> =
Lecture prepared by Ikram-E-Khuda
Solution
=> => =
•
Here sample mean 68
sample standard deviation
Sample size
For 95% CI, and interval estimation
Substituting all the above values in the estimation equation
gives
=
i.e. lower limit estimate of =
And upper limit estimate of =
Lecture prepared by Ikram-E-Khuda
Solution
•Answer
(a)
The interval estimate of population mean at 95%
CI is as follows:
Lecture prepared by Ikram-E-Khuda
The t value reference
Lecture prepared by Ikram-E-Khuda
Answer 5
Therefore 62 students should be surveyed in
order to be 95% confident that we are within 2
years of the true population mean age of
University students.
Lecture prepared by Ikram-E-Khuda
Lecture prepared by Ikram-E-Khuda