0% found this document useful (0 votes)
8 views45 pages

COMM162 - Week 05 - Sampling

Uploaded by

krisha2600
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views45 pages

COMM162 - Week 05 - Sampling

Uploaded by

krisha2600
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Sampling

Distributions
COMM 162
Week 5
Updated: Feb 5, 2025
2

Agenda

• Review: sampling
• Sampling error
• The CLT and Sampling distributions
• of the means when population SD is known
• of the means when population SD is unknown (student’s T)
• of the proportions
3

Cheat Sheet for *.DIST functions


Discrete Continuous
(e.g., Binomial, Poisson) (e.g., Normal)

Prob. Excel Function Prob. Excel Function

P(X = 8) =[Link](8, …, FALSE) P(X = 8) =0

P(X ≤ 8) =[Link](8, …, TRUE) P(X ≤ 8) =[Link](8, …, TRUE)

P(X < 8) =[Link](7, …, TRUE) P(X < 8) =[Link](8, …, TRUE)

P(X ≥ 8) =1 - [Link](7, …, TRUE) P(X ≥ 8) =1 - [Link](8, …, TRUE)

P(X > 8) =1 - [Link](8, …, TRUE) P(X > 8) =1 - [Link](8, …, TRUE)


4

Sampling
5

What is Sampling?
• Recall: a population is all individuals or items under consideration
• E.g., Incomes of every family in Ontario
• E.g., Height of every human on Earth
• Census: a study of all units of the population
• Sample: a study of only a portion of the population
6

Why Sample?
• To contact the whole population would be time consuming
• E.g.: A candidate for public office may wish to determine her chances for
election. A sample poll using the regular staff and field interviews of a
professional polling firm would take only one or two days. It could take years
to contact all the voting population!
• The cost of studying all the items in a population may be prohibitive
• E.g.: Public opinion polls and consumer testing organizations usually contact
only a small portion of the population since it is more cost effective than
contacting the entire population.
7

Why Sample?
• The physical impossibility of checking all items in the population.
• E.g.: Some populations are infinite. It would be impossible to check all the water in
Okanagan Lake for bacterial levels, so we select samples at various locations.
• The destructive nature of some tests
• E.g.: If the wine tasters in Niagara-on-the-Lake drank all the wine to evaluate the
vintage, they would consume the entire crop, and none would be available for sale.
• The sample results are adequate
• E.g.: Even if funds are available, it is doubtful the additional accuracy of a 100%
sample—that is, studying the entire population—is essential in most problems. For
example, the federal government uses a sample of grocery stores scattered
throughout Canada to determine the monthly index of food prices.
8

Sampling Error
9

Sampling Error
• Sampling Error: the difference between a sample statistic and the
corresponding population parameter
• Since the sample is only part of the population, it is unlikely the sample mean would be
exactly equal to the population mean
• The same is true for variance, standard deviation, etc.

• Increasing the sample size will help to reduce sampling error

Average car price: $24,265 Average car price: $23,602


10

Example: Comfort Inn


• Jimmy and Johnny operate an 8-room B&B called the Comfort Inn
• During September, there were N=95 rentals, µ=3.17 per night, σ=1.9
• We take two random samples (n=5) and compute the sample error
September Rentals September Rentals September Rentals
1 8 11 6 21 1
2 4 12 2 22 2 • S1 = [5, 3, 7, 3, 4] rooms rented
3 3 13 4 23 3 • 𝑥ҧ = 4.4
4 2 14 4 24 6 • Sampling error = 4.4 – 3.17 = 1.23
5 4 15 0 25 7 • S2 = [3, 3, 2, 3, 6] rooms rented
6 3 16 0 26 4 • 𝑥ҧ = 3.4
7 2 17 5 27 1 • Sampling error = 3.4 – 3.17 = 0.23
8 3 18 3 28 3
9 4 19 3 29 3
10 0 20 2 30 3
11

Non-sampling Errors
• All other errors other than sampling error
• Missing data
• Recording errors
• Measurement errors
• Input processing errors
• Analysis errors
• Response bias
• Non-response bias
• And many more!
12

The Central Limit Theorem


13

Example: Pickleball Players


• Below are the details of a small pickleball team (the population):
• The mean age is µ=62.8

Name Age
Bobby 54
Karla 55
Alec 59
Tyrese 63
Nikolai 64
Lewis 68
Cora 69
Payton 70
14

Example: Ages of Pickleball Players


• Suppose we took samples of n=2 players:

“Unlucky”
Sample Players Values Mean (𝑥)ҧ Sampling Error sample

1 Bobby, Karla [54, 55] 54.5 54.5 – 62.8 = -8.3

2 Cora, Payton [69, 70] 69.5 69.5 – 62.8 = 6.7

3 Alec, Nikolai [59, 64] 61.5 61.5 – 62.8 = -1.3


Better
15

Example: Ages of Pickleball Players What can we say about the


distribution of these
sample means?
• Suppose we took all possible samples of n=2
Sample Players Values Mean (𝑥)ҧ Sampling Error

1 Bobby, Karla [54, 55] 54.5 54.5 – 62.8 = -8.3

2 Cora, Payton [69, 70] 69.5 69.5 – 62.8 = 6.7

3 Alec, Nikolai [59, 64] 61.5 61.5 – 62.8 = -1.3

4 Bobby, Alec [54, 59] 56.5 56.5– 62.8 = -6.3

5 Bobby, Tyrese [54, 63] 58.5 58.5– 62.8 = -4.3

6 Bobby, Nikolai [54, 64] 59.0 59– 62.8 = -3.8

28 Payton, Lewis [70, 68] 69.0 69 – 62.8 = 6.2


16

Example: Ages of Pickleball Players


• Distribution of the sample means:
Called the sampling
Recall, original
distribution of the
distribution
sample mean

Looks different
than original
Looks Normal!
distribution
17

The Central Limit Theorem


• If sufficiently large (n>30)
samples are repeatedly
drawn, the sample means are
always approximately
normally distributed
• Regardless of the shape of the
population!
• If population is normal, then any size
sample works (don’t need “large”)
18

Another Example

Sample #1 Sample #2 Sample #3

Sample #4 Sample #5 Sample #6

Distribution of the 6 sample means : Distribution of 10,006 sample means

Source: [Link]
19

Why is CLT Cool?


• Even if we don’t know anything about the population, we can get a good
guess of its mean (along with confidence interval)!
• If you took all possible samples, the mean of the sample means will be the same as the
population mean
• 𝜇𝑥ҧ = 𝜇
• The std deviation of the sample means is:
𝜎
• 𝜎𝑥ҧ =
𝑛
• Sometimes called the standard error of the mean
• As n gets larger, 𝜎𝑥ҧ gets smaller → each sample’s mean is closer to population mean
• Also: we will see later that the CLT also allows us to do some useful
statistical comparisons
20

Applying the CLT


• Using CLT, 𝑋ത is a normal random variable:

• 𝑋~𝑁 𝜇𝑥ҧ = 𝜇𝑥 , 𝜎𝑥ҧ = 𝜎𝑥 / 𝑛 We assume we
know 𝜎𝑥
• And we can answer probability questions:
• “What is the probability that a sample of 35, with a mean of 50 and std of 3, has a
mean of at most54?”
• Also, we can standardize it using the same transformation as before
ҧ 𝑥ഥ
𝑥−𝜇 ҧ 𝑥ഥ
𝑥−𝜇
• 𝑧= =𝜎
𝜎𝑥ഥ 𝑥/ 𝑛
21

Example: Grocery Store


• At a grocery store, the customer spend is 𝜇 = 85.00, 𝜎 = 9.00
• What is the probability that the next 40 customers spend an average
of $87.00 or more?
22

Recap: Using the CLT with Means


• Check if CLT applies (i.e., n > 30)
• If not, STOP
• Call [Link](𝑥,ҧ 𝜇, 𝜎𝑥 / 𝑛, TRUE) as appropriate
23

Quick Check 1: Department Store


• Suppose that during any hour in a large department store, the
average number of shoppers is 448 with 𝜎 = 21
• What is the probability that a random sample of 49 different hours
will yield a sample mean between 441 and 446 shoppers?
24

Finite Population Correction Factor


• If the population is small (“finite population”), and the sample is more
than 5% (n/N > 0.05), then you should apply the finite population
correction factor:
𝑁−𝑛

• 𝑋~𝑁 𝜇𝑥ҧ = 𝜇𝑥 , 𝜎𝑥ҧ = 𝜎𝑥 / 𝑛
𝑁−1
25

Example: Production Company


• Has 350 employees. Average age is 37.6, 𝜎 = 8.3
• If a random sample of 45 employees is taken, what is the probability
that the sample mean will be less than 40?
26

New Recap: Using the CLT with Means


• Check if CLT applies (i.e., n > 30)
• If not, STOP
• Check if FPCF needs to be applied (i.e., n/N >= 5%)
𝑁−𝑛
• If yes, call [Link](𝑥,ҧ 𝜇, 𝜎𝑥 / 𝑛 , TRUE) as appropriate
𝑁−1

• If not, call [Link](𝑥,ҧ 𝜇, 𝜎𝑥 / 𝑛, TRUE) as appropriate


27

Quick Check 2: White German Shepherds


• There are 4000 White German Shepherds in the USA
• The mean weight is 75.45 pounds, with a std dev of 10.37 pounds
• We take a sample of 100 of these dogs
• What is the probability that the sample’s mean weight is within +/- 2
pounds of the population’s mean?
28

Quick Check 2: White German Shepherds (Part B)


• There are 1000 White German Shepherds in the USA
• The mean weight is 75.45 pounds, with a std dev of 10.37 pounds
• We take a sample of 100 of these dogs
• What is the probability that the sample’s mean weight is within +/- 2
pounds of the population’s mean?
29

Sampling Distributions of 𝑥ҧ
(with unknown 𝜎)
→Student’s t-Distribution
30

t-Distribution to the Rescue


• If we know the population σ, we can use [Link] to make some
conclusions ☺
• If we don’t know the population σ, we can’t use [Link] to make any
conclusions 
• However, instead of normal distribution, we can can use a t-distribution

Z
• [Link]()
• Similar to normal, just a little wider
t

0
31

t-Distribution
s instead of 𝜎𝑥
ഥ−𝝁ഥ𝒙
𝒙
• Similar to z: 𝒕 =
𝒔/ 𝒏 Z

• Bell shaped and symmetrical df = 150


• Wider than the normal to account for the
df = 15
uncertainty associated with s
• Degrees of freedom (df): n - 1
0
• As df gets bigger and bigger, t looks more
As 𝑑𝑓 → ∞ , 𝑡 → 𝑍
and more like Z
32

Example: The Dean’s Claim


• The Dean claims that new graduates make $800 per week
• You are skeptical and want to double-check
• You do a survey of 25 new grads and ask their weekly salary
• 𝑥ҧ = 750, 𝑠 = 89
• Are your findings consistent with the Dean’s claim?
• That is: What is the probability that a sample mean of 750 or lower would be
found if the population mean really was 800?
• P(𝑥ҧ <= 750) = ?
33

Recap: Using the t-Distribution


• Calculate t
• Check if FPCF needs to be applied (i.e., n/N >= 5%)
ഥ−𝝁ഥ𝒙
𝒙
• If yes, t=
𝑁−𝑛
𝒔/ 𝒏
𝑁−1

𝒙−𝝁ഥ𝒙
• If no: t = 𝒔/ 𝒏

• Calculate df (i.e., n-1)


• Call [Link](t, df, TRUE) as appropriate
34

Quick Check 3: The Dean’s Claim, Part 2


• “Oops! I meant $760 per week,” says the Dean.
• Using the same survey of 25 new grads, what can we conclude now?
35

Quick Check 4: Burger Prices


• Your buddy claims the average burger price is $7
• You are skeptical and want to double check
• You go to 16 restaurants and look at the prices
• 𝑥ҧ = 9.50, 𝑠 = 2.00
• What can you conclude?
36

Sampling Distribution of the


Sample Proportion
37

Sampling Distribution of The Proportion


• For proportions, i.e., the nominal (“countable”) scale of measurement
• A proportion (denoted p) is a fraction, ratio, or percent indicating the part of the
sample or the population having a particular characteristic
42
• E.g., If 42 out of 60 students are female, we say the proportion is 𝑝 = =0.70
60

• The proportion of a sample is denoted 𝒑



• Must check if the CLT applies:
• n*p must be > 5
• n*(1-p) must be > 5

• If so, the sampling distribution of the sampling proportion is (approximately)


normal:
𝑝(1−𝑝) Ƹ
𝑝−𝑝
• 𝑝~𝑁
Ƹ 𝜇𝑝Ƹ = 𝑝, 𝜎𝑝Ƹ = , can use 𝑧 =
𝑛 𝑝(1−𝑝)
𝑛
38

Example: Is Bob Right?


• “What % of our parts are defective?” asks the boss.
• “Only 10%, ma'am!” claims Bob.
• Is Bob right? Let’s see.
• The boss randomly selects 80 parts and finds that 12 parts are defective.
What are the chances?
39

Sampling Distribution of The Proportion


If we have a finite population, a finite population correction factor (FPCF)
should be applied…

When n/N > 0.05

𝑝(1 − 𝑝) 𝑁 − 𝑛
𝑝Ƹ ~ 𝑁 𝜇𝑝ො = 𝑝 , 𝜎𝑝ො =
𝑛 𝑁−1
40

Recap: Using the CLT with Proportions


• Check if CLT applies (i.e., n*p > 5 and n*(1-p) > 5)
• If not, STOP
• Check if FPCF needs to be applied (i.e., n/N >= 5%)
𝑝(1−𝑝) 𝑁−𝑛
• If yes, call [Link](𝑝,Ƹ 𝑝, , TRUE) as appropriate
𝑛 𝑁−1

𝑝(1−𝑝)
• If not, call [Link](𝑝,Ƹ 𝑝, , TRUE) as appropriate
𝑛
41

Quick Check 5: Toronto Rental Market


• The rental market in Toronto is huge with 1,000,000 units
• Suppose that 20% of single units rent for more than $2480 / month.
Let’s call those units high-priced units
• p = population proportion of high-priced units
• pො = sample proportion of high-priced units
• If you select a random sample of 30 units:
a. Can you apply the CLT?
b. How would the sample proportion of high-priced units be distributed?
c. How likely is it that in the sample, proportion of high-priced units is more
than 25%
42

Quick Check 6: Election


• Angie is considering running for mayor of her Fine Town for the
second time
• The population size of Fine Town is 5000
• The first time she received 75% of the popular vote
• What is the probability that in a sample of 300 town residents, at
least 240 would vote in favour of her for town mayor for the the
second time?
43

More Practice
• [See ‘COMM 162 0 Week 5 – [Link]’]
44

Summary
45

Summary
• CLT: the distribution of the sample is normal (if sample size is big enough)
ҧ 𝑥ഥ
𝑥−𝜇
• Sampling a mean when σ is known: use Z-distribution 𝑧 =
𝜎𝑥 / 𝑛
ҧ 𝑥ഥ
𝑥−𝜇
• Sampling a mean when σ is unknown: use T-distribution 𝑡 =
𝑠/ 𝑛

𝑝−𝑝
• Sampling a proportion: use 𝑧 =
𝑝(1−𝑝)
𝑛

You might also like