Sampling
Distributions
COMM 162
Week 5
Updated: Feb 5, 2025
2
Agenda
• Review: sampling
• Sampling error
• The CLT and Sampling distributions
• of the means when population SD is known
• of the means when population SD is unknown (student’s T)
• of the proportions
3
Cheat Sheet for *.DIST functions
Discrete Continuous
(e.g., Binomial, Poisson) (e.g., Normal)
Prob. Excel Function Prob. Excel Function
P(X = 8) =[Link](8, …, FALSE) P(X = 8) =0
P(X ≤ 8) =[Link](8, …, TRUE) P(X ≤ 8) =[Link](8, …, TRUE)
P(X < 8) =[Link](7, …, TRUE) P(X < 8) =[Link](8, …, TRUE)
P(X ≥ 8) =1 - [Link](7, …, TRUE) P(X ≥ 8) =1 - [Link](8, …, TRUE)
P(X > 8) =1 - [Link](8, …, TRUE) P(X > 8) =1 - [Link](8, …, TRUE)
4
Sampling
5
What is Sampling?
• Recall: a population is all individuals or items under consideration
• E.g., Incomes of every family in Ontario
• E.g., Height of every human on Earth
• Census: a study of all units of the population
• Sample: a study of only a portion of the population
6
Why Sample?
• To contact the whole population would be time consuming
• E.g.: A candidate for public office may wish to determine her chances for
election. A sample poll using the regular staff and field interviews of a
professional polling firm would take only one or two days. It could take years
to contact all the voting population!
• The cost of studying all the items in a population may be prohibitive
• E.g.: Public opinion polls and consumer testing organizations usually contact
only a small portion of the population since it is more cost effective than
contacting the entire population.
7
Why Sample?
• The physical impossibility of checking all items in the population.
• E.g.: Some populations are infinite. It would be impossible to check all the water in
Okanagan Lake for bacterial levels, so we select samples at various locations.
• The destructive nature of some tests
• E.g.: If the wine tasters in Niagara-on-the-Lake drank all the wine to evaluate the
vintage, they would consume the entire crop, and none would be available for sale.
• The sample results are adequate
• E.g.: Even if funds are available, it is doubtful the additional accuracy of a 100%
sample—that is, studying the entire population—is essential in most problems. For
example, the federal government uses a sample of grocery stores scattered
throughout Canada to determine the monthly index of food prices.
8
Sampling Error
9
Sampling Error
• Sampling Error: the difference between a sample statistic and the
corresponding population parameter
• Since the sample is only part of the population, it is unlikely the sample mean would be
exactly equal to the population mean
• The same is true for variance, standard deviation, etc.
• Increasing the sample size will help to reduce sampling error
Average car price: $24,265 Average car price: $23,602
10
Example: Comfort Inn
• Jimmy and Johnny operate an 8-room B&B called the Comfort Inn
• During September, there were N=95 rentals, µ=3.17 per night, σ=1.9
• We take two random samples (n=5) and compute the sample error
September Rentals September Rentals September Rentals
1 8 11 6 21 1
2 4 12 2 22 2 • S1 = [5, 3, 7, 3, 4] rooms rented
3 3 13 4 23 3 • 𝑥ҧ = 4.4
4 2 14 4 24 6 • Sampling error = 4.4 – 3.17 = 1.23
5 4 15 0 25 7 • S2 = [3, 3, 2, 3, 6] rooms rented
6 3 16 0 26 4 • 𝑥ҧ = 3.4
7 2 17 5 27 1 • Sampling error = 3.4 – 3.17 = 0.23
8 3 18 3 28 3
9 4 19 3 29 3
10 0 20 2 30 3
11
Non-sampling Errors
• All other errors other than sampling error
• Missing data
• Recording errors
• Measurement errors
• Input processing errors
• Analysis errors
• Response bias
• Non-response bias
• And many more!
12
The Central Limit Theorem
13
Example: Pickleball Players
• Below are the details of a small pickleball team (the population):
• The mean age is µ=62.8
Name Age
Bobby 54
Karla 55
Alec 59
Tyrese 63
Nikolai 64
Lewis 68
Cora 69
Payton 70
14
Example: Ages of Pickleball Players
• Suppose we took samples of n=2 players:
“Unlucky”
Sample Players Values Mean (𝑥)ҧ Sampling Error sample
1 Bobby, Karla [54, 55] 54.5 54.5 – 62.8 = -8.3
2 Cora, Payton [69, 70] 69.5 69.5 – 62.8 = 6.7
3 Alec, Nikolai [59, 64] 61.5 61.5 – 62.8 = -1.3
…
Better
15
Example: Ages of Pickleball Players What can we say about the
distribution of these
sample means?
• Suppose we took all possible samples of n=2
Sample Players Values Mean (𝑥)ҧ Sampling Error
1 Bobby, Karla [54, 55] 54.5 54.5 – 62.8 = -8.3
2 Cora, Payton [69, 70] 69.5 69.5 – 62.8 = 6.7
3 Alec, Nikolai [59, 64] 61.5 61.5 – 62.8 = -1.3
4 Bobby, Alec [54, 59] 56.5 56.5– 62.8 = -6.3
5 Bobby, Tyrese [54, 63] 58.5 58.5– 62.8 = -4.3
6 Bobby, Nikolai [54, 64] 59.0 59– 62.8 = -3.8
28 Payton, Lewis [70, 68] 69.0 69 – 62.8 = 6.2
16
Example: Ages of Pickleball Players
• Distribution of the sample means:
Called the sampling
Recall, original
distribution of the
distribution
sample mean
Looks different
than original
Looks Normal!
distribution
17
The Central Limit Theorem
• If sufficiently large (n>30)
samples are repeatedly
drawn, the sample means are
always approximately
normally distributed
• Regardless of the shape of the
population!
• If population is normal, then any size
sample works (don’t need “large”)
18
Another Example
Sample #1 Sample #2 Sample #3
Sample #4 Sample #5 Sample #6
Distribution of the 6 sample means : Distribution of 10,006 sample means
Source: [Link]
19
Why is CLT Cool?
• Even if we don’t know anything about the population, we can get a good
guess of its mean (along with confidence interval)!
• If you took all possible samples, the mean of the sample means will be the same as the
population mean
• 𝜇𝑥ҧ = 𝜇
• The std deviation of the sample means is:
𝜎
• 𝜎𝑥ҧ =
𝑛
• Sometimes called the standard error of the mean
• As n gets larger, 𝜎𝑥ҧ gets smaller → each sample’s mean is closer to population mean
• Also: we will see later that the CLT also allows us to do some useful
statistical comparisons
20
Applying the CLT
• Using CLT, 𝑋ത is a normal random variable:
ത
• 𝑋~𝑁 𝜇𝑥ҧ = 𝜇𝑥 , 𝜎𝑥ҧ = 𝜎𝑥 / 𝑛 We assume we
know 𝜎𝑥
• And we can answer probability questions:
• “What is the probability that a sample of 35, with a mean of 50 and std of 3, has a
mean of at most54?”
• Also, we can standardize it using the same transformation as before
ҧ 𝑥ഥ
𝑥−𝜇 ҧ 𝑥ഥ
𝑥−𝜇
• 𝑧= =𝜎
𝜎𝑥ഥ 𝑥/ 𝑛
21
Example: Grocery Store
• At a grocery store, the customer spend is 𝜇 = 85.00, 𝜎 = 9.00
• What is the probability that the next 40 customers spend an average
of $87.00 or more?
22
Recap: Using the CLT with Means
• Check if CLT applies (i.e., n > 30)
• If not, STOP
• Call [Link](𝑥,ҧ 𝜇, 𝜎𝑥 / 𝑛, TRUE) as appropriate
23
Quick Check 1: Department Store
• Suppose that during any hour in a large department store, the
average number of shoppers is 448 with 𝜎 = 21
• What is the probability that a random sample of 49 different hours
will yield a sample mean between 441 and 446 shoppers?
24
Finite Population Correction Factor
• If the population is small (“finite population”), and the sample is more
than 5% (n/N > 0.05), then you should apply the finite population
correction factor:
𝑁−𝑛
ത
• 𝑋~𝑁 𝜇𝑥ҧ = 𝜇𝑥 , 𝜎𝑥ҧ = 𝜎𝑥 / 𝑛
𝑁−1
25
Example: Production Company
• Has 350 employees. Average age is 37.6, 𝜎 = 8.3
• If a random sample of 45 employees is taken, what is the probability
that the sample mean will be less than 40?
26
New Recap: Using the CLT with Means
• Check if CLT applies (i.e., n > 30)
• If not, STOP
• Check if FPCF needs to be applied (i.e., n/N >= 5%)
𝑁−𝑛
• If yes, call [Link](𝑥,ҧ 𝜇, 𝜎𝑥 / 𝑛 , TRUE) as appropriate
𝑁−1
• If not, call [Link](𝑥,ҧ 𝜇, 𝜎𝑥 / 𝑛, TRUE) as appropriate
27
Quick Check 2: White German Shepherds
• There are 4000 White German Shepherds in the USA
• The mean weight is 75.45 pounds, with a std dev of 10.37 pounds
• We take a sample of 100 of these dogs
• What is the probability that the sample’s mean weight is within +/- 2
pounds of the population’s mean?
28
Quick Check 2: White German Shepherds (Part B)
• There are 1000 White German Shepherds in the USA
• The mean weight is 75.45 pounds, with a std dev of 10.37 pounds
• We take a sample of 100 of these dogs
• What is the probability that the sample’s mean weight is within +/- 2
pounds of the population’s mean?
29
Sampling Distributions of 𝑥ҧ
(with unknown 𝜎)
→Student’s t-Distribution
30
t-Distribution to the Rescue
• If we know the population σ, we can use [Link] to make some
conclusions ☺
• If we don’t know the population σ, we can’t use [Link] to make any
conclusions
• However, instead of normal distribution, we can can use a t-distribution
☺
Z
• [Link]()
• Similar to normal, just a little wider
t
0
31
t-Distribution
s instead of 𝜎𝑥
ഥ−𝝁ഥ𝒙
𝒙
• Similar to z: 𝒕 =
𝒔/ 𝒏 Z
• Bell shaped and symmetrical df = 150
• Wider than the normal to account for the
df = 15
uncertainty associated with s
• Degrees of freedom (df): n - 1
0
• As df gets bigger and bigger, t looks more
As 𝑑𝑓 → ∞ , 𝑡 → 𝑍
and more like Z
32
Example: The Dean’s Claim
• The Dean claims that new graduates make $800 per week
• You are skeptical and want to double-check
• You do a survey of 25 new grads and ask their weekly salary
• 𝑥ҧ = 750, 𝑠 = 89
• Are your findings consistent with the Dean’s claim?
• That is: What is the probability that a sample mean of 750 or lower would be
found if the population mean really was 800?
• P(𝑥ҧ <= 750) = ?
33
Recap: Using the t-Distribution
• Calculate t
• Check if FPCF needs to be applied (i.e., n/N >= 5%)
ഥ−𝝁ഥ𝒙
𝒙
• If yes, t=
𝑁−𝑛
𝒔/ 𝒏
𝑁−1
ഥ
𝒙−𝝁ഥ𝒙
• If no: t = 𝒔/ 𝒏
• Calculate df (i.e., n-1)
• Call [Link](t, df, TRUE) as appropriate
34
Quick Check 3: The Dean’s Claim, Part 2
• “Oops! I meant $760 per week,” says the Dean.
• Using the same survey of 25 new grads, what can we conclude now?
35
Quick Check 4: Burger Prices
• Your buddy claims the average burger price is $7
• You are skeptical and want to double check
• You go to 16 restaurants and look at the prices
• 𝑥ҧ = 9.50, 𝑠 = 2.00
• What can you conclude?
36
Sampling Distribution of the
Sample Proportion
37
Sampling Distribution of The Proportion
• For proportions, i.e., the nominal (“countable”) scale of measurement
• A proportion (denoted p) is a fraction, ratio, or percent indicating the part of the
sample or the population having a particular characteristic
42
• E.g., If 42 out of 60 students are female, we say the proportion is 𝑝 = =0.70
60
• The proportion of a sample is denoted 𝒑
ෝ
• Must check if the CLT applies:
• n*p must be > 5
• n*(1-p) must be > 5
• If so, the sampling distribution of the sampling proportion is (approximately)
normal:
𝑝(1−𝑝) Ƹ
𝑝−𝑝
• 𝑝~𝑁
Ƹ 𝜇𝑝Ƹ = 𝑝, 𝜎𝑝Ƹ = , can use 𝑧 =
𝑛 𝑝(1−𝑝)
𝑛
38
Example: Is Bob Right?
• “What % of our parts are defective?” asks the boss.
• “Only 10%, ma'am!” claims Bob.
• Is Bob right? Let’s see.
• The boss randomly selects 80 parts and finds that 12 parts are defective.
What are the chances?
39
Sampling Distribution of The Proportion
If we have a finite population, a finite population correction factor (FPCF)
should be applied…
When n/N > 0.05
𝑝(1 − 𝑝) 𝑁 − 𝑛
𝑝Ƹ ~ 𝑁 𝜇𝑝ො = 𝑝 , 𝜎𝑝ො =
𝑛 𝑁−1
40
Recap: Using the CLT with Proportions
• Check if CLT applies (i.e., n*p > 5 and n*(1-p) > 5)
• If not, STOP
• Check if FPCF needs to be applied (i.e., n/N >= 5%)
𝑝(1−𝑝) 𝑁−𝑛
• If yes, call [Link](𝑝,Ƹ 𝑝, , TRUE) as appropriate
𝑛 𝑁−1
𝑝(1−𝑝)
• If not, call [Link](𝑝,Ƹ 𝑝, , TRUE) as appropriate
𝑛
41
Quick Check 5: Toronto Rental Market
• The rental market in Toronto is huge with 1,000,000 units
• Suppose that 20% of single units rent for more than $2480 / month.
Let’s call those units high-priced units
• p = population proportion of high-priced units
• pො = sample proportion of high-priced units
• If you select a random sample of 30 units:
a. Can you apply the CLT?
b. How would the sample proportion of high-priced units be distributed?
c. How likely is it that in the sample, proportion of high-priced units is more
than 25%
42
Quick Check 6: Election
• Angie is considering running for mayor of her Fine Town for the
second time
• The population size of Fine Town is 5000
• The first time she received 75% of the popular vote
• What is the probability that in a sample of 300 town residents, at
least 240 would vote in favour of her for town mayor for the the
second time?
43
More Practice
• [See ‘COMM 162 0 Week 5 – [Link]’]
44
Summary
45
Summary
• CLT: the distribution of the sample is normal (if sample size is big enough)
ҧ 𝑥ഥ
𝑥−𝜇
• Sampling a mean when σ is known: use Z-distribution 𝑧 =
𝜎𝑥 / 𝑛
ҧ 𝑥ഥ
𝑥−𝜇
• Sampling a mean when σ is unknown: use T-distribution 𝑡 =
𝑠/ 𝑛
ො
𝑝−𝑝
• Sampling a proportion: use 𝑧 =
𝑝(1−𝑝)
𝑛