Chapter 9: Sampling Distributions & Confidence Intervals for a proportion
Example #1: Classroom Experiment: Simulating a sampling distribution for a sample proportion.
Suppose students are asked to toss a penny and record the number of heads they get. 200 Students were directed
to consider the distribution of sample proportion values from samples of 10 and 100 spins.
Trial #1:
Variable N n Mean StDev
Sample Prop 200 10 0.4980 0.1490
Histogram of sample proportion of heads: n = 10
50
40
Frequency
30
20
10
0
0.2 0.4 0.6 0.8
proportion
Trial #2:
Variable N n Mean StDev
Sample Prop 200 100 0.49790 0.05031
Histogram of sample proportion of heads: n = 100
40
30
Frequency
20
10
0
0.40 0.44 0.48 0.52 0.56 0.60
proportion
Question #1: Did the students get the same sample proportion every time? This is called Sampling Variability.
Question #2: Compare the two graphs. Which one did a better job at estimating the true proportion?
78
❑ Modeling the Distribution of Sample Proportions
If we consider a binomial experiment:
x
The sample proportion (p-hat): p̂ = number of successes in sample Notation: p̂ =
n
total number in the sample
Sample proportions summarize categorical variables.
Assumptions and Conditions:
1. The sampled values must be independent of one another.
2. The sample size, n, must be large enough: as a safe (and conservative) rule of thumb, check that the
number of successes and the number of failures are at least 10.
np 10
nq 10
3. If sampling has not been made using replacement, the sample size must be no larger than 10% of the
population. Usually, populations are so large that 10% is a small fraction.
4. Randomization condition: If you have a survey, the sample should be a simple random sample. If the data
comes from an experiment, subjects should have been randomly assigned to treatments.
The Sampling Distribution Model for a Proportion
Mean of a sample proportion: p̂ = p
p( 1 − p ) pq
Standard deviation of a sample proportion: p̂ = =
n n
When n is sufficient large and the true proportion p is not too near 0 or 1, the sampling distribution model
pq
for a proportion is approximately normal. p̂ ~ AN p ,
n
So, to find probabilities about p, first standardize to a z, then use the z-table.
• Note: The standardized z-score we will use for sample proportions is as follows:
p̂ − p p̂ − p̂
Z= = Z=
pq p̂
n
*Note: q=1-p
79
Example #1: The Associated Press reported that 71% of Americans ages 25 and older are overweight. A
researcher wants to know whether the proportion of such individuals in his state that are overweight differs from
the national proportion. A random sample of 600 adults in his state results in 405 who are classified as
overweight.
a. What is the sample proportion of overweight Americans?
b. Check and verify all of the assumptions and conditions.
c. Describe the sampling distribution of the sample proportion for size 600 using the appropriate
notation.
d. Find the probability that at most 0.675 of the 600 sampled adults are classified overweight.
Example #2:
A study found that the national freshman-to-sophomore retention rate was 74%. In a random sample of 103
freshmen at a large university, 81% returned the next year as sophomores.
a) Describe the sampling distribution of the sample proportion of freshmen that return as sophomores.
b) What is the probability that the retention rate in a sample of 103 students is at least 0.81?
80
Extra Practice Problems:
1. Suppose that 13% of the population is left handed. An auditorium at a large college has been built with 15
“lefty seats”, seats that have the built in desk on the left. A class is made up of a random sample of 90 students
from this large college.
A. Are the requirements for using a normal model met? Identify (and check) two requirements:
1. ______________________________________________________________________
2. ______________________________________________________________________
B. What is the probability that more than 17% of the sample of 90 students is left handed? (so there will not be
enough appropriate seats for the left-handed students)
Answer:
2. Assume that 30% of students at a university wear contact lenses. A random sample of 100 students is selected.
a) Describe the sampling distribution of p̂ , the sample proportion of students who wear contacts.
b) What is the probability that more than one third of this sample wear contacts?
Answer:
81
Confidence Intervals for Proportions
❑ A point estimate of a population characteristic (parameter) is a single number that is based on sample data
(statistic) and that represents a feasible value of the characteristic of interest.
❑ An unbiased estimator is a sample statistic whose mean value is equal to the value of the population
parameter being estimated. We also want the estimate to be precise (have low variability)
The sample mean ___ is an unbiased estimator of the true mean (µ).
The sample proportion ___ is an unbiased estimator of the true proportion (p).
❑ A confidence interval (CI) for a population parameter is an interval estimate of feasible values for the
parameter of interest. It is calculated so that, with a certain degree of confidence (confidence level), the value
of the parameter is captured within the lower and upper endpoints of the interval.
❑ The confidence level, determined by the researcher, represents the success rate of the method used to
construct the interval.
Recall the Sampling Distribution of p̂ :
Remember, **If many samples of the same size (n) are taken, and sample proportions are calculated, the
sampling distribution of the sample proportions is approximately normal with the following mean and standard
deviation (as long as the number of successes and failures are at least 10):
pq
p̂ ~ AN p ,
n
In this chapter, we are trying to draw some conclusions about the true population proportion p. Since the sample
proportion is used all the way through the calculation, we need to use an estimated version of the standard
deviation of p. The estimated version is called the standard error of p.
p̂q̂
Standard error of ( p̂ )= q̂ =1 − p̂
n
Margin of Error:
❑ The margin of error shows how accurate we believe our estimate to be based on the variability of the
estimate.
❑ In general, for a particular confidence level (Level C), the margin of error (ME) is:
pˆ qˆ
ME= Z *
n
p̂q̂
Where ME is the margin of error, Z is the standardized Z score based on the level of confidence, and
*
is
n
the standard error of the sample proportion.
82
Confidence Intervals for p̂ :
A point estimate of a population characteristic (parameter) is a single number that is based on sample data
(statistic) and that represents a feasible value of the characteristic of interest.
An approximate level C confidence interval for p has two components:
Point Estimate +/- Margin of error
pˆ qˆ
p̂ Z *
n
Write out the interval with a lower and an upper endpoint in parentheses:
pˆ qˆ pˆ qˆ
( p̂ - Z * , p̂ + Z * )
n n
We can construct confidence intervals for p if the following assumptions are satisfied:
np̂ 10
1)
nq̂ 10
2) The sample size is less than 10% of the population size if sampling is without replacement
3) The sample can be regarded as a simple random sample from the population of interest.
4) The data values are assumed to be independent of each other.
What affects the width of the interval (margin of error)?
In order to decrease the margin of error for greater precision we should (circle):
1. Increase or decrease the confidence level
2. Increase or decrease the sample size
❑ Interpretation of the interval (IMPORTANT)!!!
▪ We are C% confident that the true proportion p is captured in our interval.
▪ If we took many, many random samples from a population and calculated a confidence interval
for each sample, C% of the Confidence Intervals would contain the true proportion p.
83
Example #1: A local restaurant keeps records of reservations and no-shows. In a random sample of 150 Saturday
reservations, it is found that 70 of them were no-shows.
a. Find the sample proportion of Saturday no-shows.
b. Find a 95% confidence interval for the true proportion of Saturday no-shows at this restaurant.
c. Which of the following statements is a correct interpretation of the confidence interval obtained?
A. We can be 95% confident that the proportion of Saturday no-shows in the sample is within the interval
obtained.
B. If this study were to be repeated with a sample of the same size, there is a .95 probability that the sample
proportion would be in the interval obtained.
C. We can be 95% confident that the population proportion of Saturday no-shows is within the interval
obtained.
D. There is a 95% probability that the population proportion of Saturday no-shows is within the interval
obtained.
Example #2:
In a random survey of 900 American adults, 380 said that their favorite sport is football.
a) Verify that the requirements for constructing a confidence interval are satisfied.
b) Construct a 90% confidence interval for the population proportion of American adults who say that their
favorite sport is football.
c) Give a one-sentence interpretation of the confidence interval.
84
Example #3:
A study was conducted to estimate the true proportion of students who drive a convertible. A random sample of
200 student cars was selected, and a 90% confidence interval for the true proportion of convertibles driven by
students was calculated to be (0.0520, 0.1188)
a. Find the point estimate for the true proportion.
b. Find the margin of error.
Sample Size for a Desired Margin of Error
If we want to fix the margin of error with a certain level of confidence, we can determine the number of subjects
(n) needed in the sample:
n=
( ) 2
Z * pˆ qˆ
2
ME
p̂ is an estimated amount for the population proportion (the sample proportion).
If p̂ is unknown, use the most conservative value of p* = 0.5.
Since n is the sample size it must be a whole number! Be conservative and round up!
*Note: In order to cut your margin of error in half you need to quadruple your sample size.
Example: Affirmative Action
A sociologist wishes to conduct a poll in order to estimate the percentage of Americans who judge that
affirmative action programs for minorities and women should be continued at some level. What sample size
should be obtained if she wishes the estimate to be within 3 percentage points with 95% confidence if:
a) She uses a 2007 estimate of 85% obtained from a CNN poll?
b) She does not use any prior estimates?
c) Compare the results from a) and b)
85
Practice Problems:
1. A new technique, balloon angioplasty, claims to open clogged heart valves and vessels. The balloon is
inserted via a catheter and is inflated, opening the vessel; thus, no surgery is required. The surgical technique for
unclogging valves and vessels has a 60% success rate. A preliminary clinical trial is conducted to test the balloon
technique: 50 people with clogged vessels are given the balloons, and 39 of these respond the technique.
a. Calculate a 90% confidence interval for p the true population proportion.
b. Interpret your interval in the context of the problem.
c. Do you think there is evidence for the superiority of the balloon technique over surgery? Explain.
d. The phase II clinical trials for the balloon angioplasty technique require estimating the true population
proportion with a high accuracy. If the researchers wish to have a 99% confidence interval with an error
of not more than 2%, how many subjects will they need in this phase of testing?
2. In May of 2006, the Gallup Organization asked a random sample of 537 American adults what they thought
was the better penalty for murder, the death penalty or life imprisonment without the possibility of parole. Of
those polled, 47% indicated support for the death penalty.
a) Find a 95% confidence interval for the percentage of all American adults who favor the death penalty.
b) Based on your confidence interval, is it clear that the death penalty no longer has majority support?
Explain.
c) If pollsters wanted to follow up on this poll with another survey that could determine the level of support
for the death penalty to within 2% with 98% confidence, how many people should they poll?
86
3. An insurance company checks police records on 582 accidents and notes that teenagers were at the wheel in 91
of them.
a) Construct a 99% confidence interval for the true proportion of all auto accidents that involve teenage
drivers.
b) Give a one-sentence interpretation of your confidence interval.
c) A politician urging tighter restrictions on drivers’ licenses issued to teenagers’ claims that 1 in every 5
accidents involves teenage drivers. Does your confidence interval provide evidence against this claim?
d) Suppose the researchers decided that they wanted to obtain a shorter confidence interval than the one
calculated. This could be accomplished by:
I. Using a smaller sample size
II. Using a higher level of confidence
III. Using a larger sample size
IV. Using a lower level of confidence
a) I only b) III only c) IV only
d) II only e) I and II only f) III and IV only
Answer:
Use the following to answer questions 4 and 5:
A study of rural literacy in boys age 10-15 in Mali found with 95% confidence that the true proportion of rural
boys that are literate is between 0.51 and 0.57.
4. Which of the following are true statements about this confidence interval?
A. We don’t know exactly what proportion of rural Malinese boys, age 10-15, are literate, but the interval
from 0.51 to 0.57 contains the true proportion.
B. We know that the proportion of Malinese boys, age 10-15, that are literate is between 0.51 and 0.57.
C. We are 95% confident that between 51% and 57% of rural Malinese boys, age 10-15, are literate.
D. Between 51% and 57% of rural Malinese boys, age 10-15, are literate.
Answer
5. What are the point estimate of the true proportion and the margin of error?
A. 0.90, 0.03 B. 0.459, 0.05 C. 0.54, 0.06 D. 0.54, 0.03
Answer
87