0% found this document useful (0 votes)
70 views15 pages

Sampling Distributions & Confidence Intervals

This document discusses sampling distributions and confidence intervals. It covers the central limit theorem and how sampling distributions can be used to estimate population parameters even when the parent distribution is unknown. It also discusses how confidence intervals provide a range estimate for population parameters like the mean and proportion, using the sampling distribution and establishing upper and lower bounds within which the parameter is likely to fall based on the chosen confidence level. Point estimates and bias of estimators are also covered.

Uploaded by

xthele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views15 pages

Sampling Distributions & Confidence Intervals

This document discusses sampling distributions and confidence intervals. It covers the central limit theorem and how sampling distributions can be used to estimate population parameters even when the parent distribution is unknown. It also discusses how confidence intervals provide a range estimate for population parameters like the mean and proportion, using the sampling distribution and establishing upper and lower bounds within which the parameter is likely to fall based on the chosen confidence level. Point estimates and bias of estimators are also covered.

Uploaded by

xthele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

AB1202

Statistics and Analysis


Lecture 4
Sampling Distributions and Confidence
Intervals
Chin Chee Kai
[email protected]
Nanyang Business School
Nanyang Technological University
NBS 2016S1 AB1202 CCK-STAT-018
2

Sampling Distributions and Confidence


Intervals
Sampling Distributions
• The Need for Random Sampling
• Sampling Distribution of Sample Mean
• Central Limit Theorem (CLT)
• Sampling Distribution of Sample Proportion

Confidence Intervals
• Point and Interval Estimation
• Confidence Intervals For Population Mean
• Confidence Intervals For Population Proportion 𝑝
• Determination of Sample Size
NBS 2016S1 AB1202 CCK-STAT-018
3

The Need for Random Sampling


• Sampling plays a crucial role in our entire daily life
because:
▫ There is a reality out there that we cannot “see”
▫ We “see” only through instruments, like eyes,
measurement tools, and sensors.
▫ All instruments report measurements with errors,
including our senses.
 For example, a 100-mm ruler would, in practice, be
measured at one time as 100.5 mm, then as 99.3 mm
another time, even though the actual, real ruler’s
length is fixed at 100 mm.

• It affects all businesses, and how we make


decisions.
NBS 2016S1 AB1202 CCK-STAT-018
4

Sampling Distribution of Sample Mean


• Let 𝑋~𝑁 𝜇, 𝜎 2 be a population distribution. If a
sample of size 𝑛 is drawn from 𝑋, then the sample 𝜎
mean, 𝑋, will follow:
𝑋~𝑁 𝜇𝑋 , 𝜎𝑋 2 𝜇
𝑋

𝜎
where 𝜇𝑋 = 𝜇 and 𝜎𝑋 = if 𝜎 is known. This becomes:
𝑛
𝑋−𝜇𝑋
𝑡= ~𝑡𝜈 with d.f. 𝜈 = 𝑛 − 1
𝜎𝑋 𝜎𝑋
𝑠
where 𝜇𝑋 = 𝜇 and 𝜎𝑋 = if 𝜎 is unknown. 𝑋
𝑛 𝜇𝑋
• Note that there are 2 population distributions to keep in mind when
we talk about a sampling distribution:
▫ The parent distribution 𝑋~𝑁 𝜇, 𝜎 2 , and
▫ The sampling distribution 𝑋~𝑁 𝜇𝑋 , 𝜎𝑋 2
• We want to study parent distribution, but cannot do so directly. So
we sample, and derive conclusions about parent distribution
through sampling distribution.
• Conclusions are often related to the population parameters 𝜇 and 𝜎.
NBS 2016S1 AB1202 CCK-STAT-018
5

5-Million-Ball Example
• If we take each ball one at a time, measure and record
its diameter, we get: X=9.8, 10.1, 8.3, 9.5, 10.2, …, 8.2
• So the entire population is known.
• If the average is 10 and s.d. is 2, then 𝜇 = 10, 𝜎 = 2
𝑋~𝑁 10,2 2
• But suppose we don’t know 𝜇.
• We sample 50 balls. If we
measure their diameters, average
them, then 𝑥 = 9.7
• We repeat such 50-ball averaging
indefinitely to get 𝑋
• Sampling distribution
𝑋~𝑁 𝜇𝑋 ,σ𝑋2 where 𝜇𝑋 = 𝜇 and
𝜎
σ𝑋 = .
𝑛
NBS 2016S1 AB1202 CCK-STAT-018
6

Central Limit Theorem (CLT)


• Given a population with distribution 𝑋, mean 𝜇 and
standard deviation 𝜎, the sampling distribution 𝑋 of
sample size 𝑛 will tend towards a normal
𝜎
distribution
with mean 𝜇 and standard deviation as the sample
𝑛
size becomes large (𝑛 ≥ 30)
𝑋~𝑁 𝜇𝑋 , 𝜎𝑋 2
𝜎
where 𝜇𝑋 = 𝜇 and 𝜎𝑋 = .
𝑛

• Note that CLT works for non-normal parent


distributions, as long as sample size 𝑛 is large, the
mean 𝜇 and standard deviation 𝜎 of the parent
population are all defined.
(Recall some distributions may have undefined
standard deviation. Also, “defined” does not mean “known”)
NBS 2016S1 AB1202 CCK-STAT-018
7

Sampling Distribution of Sample Proportion


• If 𝑋 is a Bernoulli distribution with outcomes being 0 or
1, (ie, it is a binomial distribution with 𝑛 = 1 and
success probability 𝑝), then its mean is 𝜇 = 𝑝 and
standard deviation is 𝜎 = 𝑝(1 − 𝑝).
• If we take a sample of size 𝑛 (where 𝑛 is large), the
sampling distribution of the sample proportion 𝑋 will,
by CLT, be:
𝑋~𝑁 𝜇𝑋 , 𝜎𝑋 2
𝑝 1−𝑝
where 𝜇𝑋 = 𝑝 and 𝜎𝑋 = .
𝑛
• This means that a population with a proportion of
𝑝 being our desired event will have a sampling
distribution 𝑋 as given above.
• This allows us to estimate or determine a population
proportion through sampling.
NBS 2016S1 AB1202 CCK-STAT-018
8

5-Million-Ball Example
• If we take each ball one at a time and write 0 for
black, 1 for white, we get: X=0,1,0,0,1,0,0,0,0,1,…,0
The entire population is known with 0.5M black balls.
• If there are 1.5M black balls, then 𝑝 = 0.3
• But suppose we don’t know 𝑝. 𝑋~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖 0.3
• We sample 50 balls. If we count
20 black balls, then 𝑝 = 0.4
• We repeat such 50-ball
proportioning indefinitely to get 𝑋
• Sampling distribution
𝑋~𝑁 𝜇𝑋 ,σ𝑋2 where 𝜇𝑋 = 𝑝 and
𝑝 1−𝑝
σ𝑋 = .
𝑛
NBS 2016S1 AB1202 CCK-STAT-018
9

Point Estimation
• Recall that our main objective is to determine
parameters of population – numbers which are
assumed to exist, are constant, and are also
expensive to determine directly.
• Point estimates are single-point (ie a plain number)
approximation of a population parameter.
▫ Eg: We take 3 sample measurements and found a
sample mean 𝑥 = 7.5 cm. We could “point-estimate”
population mean 𝜇 to be 7.5 cm (when actually it could
be 8 cm or something else).
• We can also point-estimate population variance 𝜎 2
using sample variance 𝑠 2 .
▫ Eg. Measuring a sample variance 𝑠 2 = 5.2, we “point-
estimate” population variance 𝜎 2 to be also 5.2.
NBS 2016S1 AB1202 CCK-STAT-018
10

Biasness of Point Estimators


• Point estimator: a function of sample values to
produce an approximation of population parameter.
• May be unbiased or biased
▫ depending on whether its expected value converges
exactly to the population parameter being estimated,
or not.

• Eg: 𝑥 → 𝜇 is unbiased; 𝑠 2 → 𝜎 2 is unbiased.


• Eg: 𝑠 → 𝜎 is biased (but we use it anyway).
• Biasness is a property of the point estimator, not a
judgment about how useful the point estimator is.
NBS 2016S1 AB1202 CCK-STAT-018
11

Confidence Interval (CI)


• Other than point estimates, we can also give range
estimate of population parameter. An even better
estimate is to use confidence interval.
• Confidence Interval (CI) has 3 numbers:
𝐿, 𝐻 95% confidence level
▫ 𝐿 = Lower limit of interval
▫ 𝐻 = Higher limit of interval
▫ Confidence level = amount of belief that the intended
population parameter is captured within the interval from
𝐿 to 𝐻. Practical values might be > 80%.
▫ 𝐻 − 𝐿 is called width (of interval).
▫ Half of width is called many things, like error, tolerance,
margin, margin of error, maximum deviation, etc.
• CI is obtained from sampling  no surprise that it
involves the sampling distribution of samples.
NBS 2016S1 AB1202 CCK-STAT-018
12

CI For Population Mean, Known 𝜎


• Given a confidence level 𝐶% (eg 𝐶 =95) with known
population standard deviation 𝜎, CI for population
mean 𝜇 is:
𝜎
𝑥±𝑍
𝑛
▫ 𝑛 = sample size
1 𝐶
▫ 𝑍 = inverse value of right-tail probability 1−
2 100
from standard normal distribution 𝑁 0, 12 .
• Note that:
▫ For small 𝑛 (𝑛 < 30), we would require assumption of
normality for parent population.
▫ For large 𝑛, it still works for non-normal parent
distribution, thanks to CLT.
NBS 2016S1 AB1202 CCK-STAT-018
13

CI For Population Mean, Unknown 𝜎


• Given a confidence level 𝐶% but population standard
deviation is not known, CI for population mean 𝜇 is:
𝑠
𝑥±𝑡
𝑛
▫ 𝑛 = sample size, 𝑠 = sample standard deviation
1 𝐶
▫ 𝑡 = inverse value of right-tail probability 1 −
2 100
from Student-t distribution of degree of freedom
𝜐 = 𝑛 − 1.
• Note that:
▫ Assumption of normality (or at least approximate
normality) of parent population is necessary.
NBS 2016S1 AB1202 CCK-STAT-018
14

CI For Population Proportion 𝑝


• Given a confidence level 𝐶%, CI for population
proportion 𝑝 is:
𝑝 1−𝑝
𝑝±𝑍
𝑛
▫ 𝑛 = sample size, 𝑝 = sample proportion
1 𝐶
▫ 𝑍 = inverse value of right-tail probability 1 − from
2 100
standard normal distribution 𝑁 0, 12 .
• Note that:
▫ Assumption of normality of parent population is not
appropriate.
▫ So we typically need to ensure that 𝑛 is large.
▫ CLT then allows us to assume sampling distribution is
normal.
▫ Rule of thumb: 𝑛 is large if both 𝑛𝑝, 𝑛 1 − 𝑝 ≥ 5.
NBS 2016S1 AB1202 CCK-STAT-018
15

Determination of Sample Size


• Notice by now that CIs are generally in the form:
𝜎
𝑥±𝑉∙
𝑛
where 𝑉 is the appropriate value from the appropriate
sampling distribution.
𝜎 𝜎
• Thus, 𝐿 = 𝑥 − 𝑉 ∙ and 𝐻 = 𝑥 + 𝑉 ∙
𝑛 𝑛
𝐻+𝐿 𝐻−𝐿 𝜎
• So: = 𝑥 , and =𝑉∙
2 2 𝑛
𝐻−𝐿
• If you are given tolerance 𝐵 = and are required to
2
find minimum 𝑛 to achieve 𝐵 at confidence level 𝐶%,
then:
2
𝑉𝐶% ∙ 𝜎
𝑛=
𝐵

You might also like