Sampling Techniques and Data Presentation
Sampling Techniques and Data Presentation
Key Terms
Population
Sample
Summary
1
For example, board members of a homeowner’s association want to
survey the homeowners to get their opinions about new paint colors for
common walls.
The population is the set containing all the homeowners in the
neighborhood.
Example, A city council of a small city needs to know whether its
residents will support a renovation of the city park. Describe the
population.
The population is the set containing all residents of the city.
2
CHECKPOINT 1 - (p. 773)
A city government wants to conduct a survey among the city’s
homeless to discover their opinions about required residence in
city shelters from midnight until 6 a.m.
3
Objective 2 – Select an appropriate sampling technique.
Key Terms
Random sample, Representative sample
Summary
4
5
CHECKPOINT 2 - p. 775
6
Objective 3 – Organize and present data.
Key Terms
Data item
Data value
Frequency distribution
Frequency
Summary
After data have been collected from a sample of the population, the next
task is to present the data in a condensed and manageable form, so the
data can be more easily interpreted.
7
8
9
10
11
12
13
Histogram A histogram with a superimposed frequency polygon
Frequency Polygon
14
15
Objective 4 – Identify deceptions in visual displays of data.
Key Terms
None
Summary
16
Six Things to Watch for in Visual Displays of Data
1. Is there a title that explains what is being displayed?
2. Are numbers lined up with tick marks on the vertical axis that clearly indicate
the scale? Has the scale been varied to create a more or less dramatic impression
than shown by the actual data?
3. Do too many design and cosmetic effects draw attention from or distort the
data?
4. Has the wrong impression been created about how the data are changing
because equally spaced time intervals are not used on the horizontal axis?
Furthermore, has a time interval been chosen that allows the data to be interpreted
in various ways?
5. Are bar sizes scaled proportionately in terms of the data they represent?
6. Is there a source that indicates where the data in the display came from? Do
the data come from an entire population or a sample? Was a random sample used
and, if so, are there possible differences between what is displayed in the graph and
what is occurring in the entire population? Who is presenting the visual display, and
does that person have a special case to make for or against the trend shown by the
graph?
17
Guided Example
What are the misleading visual displays within the graphic display?
The visual display of the dollar
bills varies in both length and
width of the bills to show
diminishing power of the
dollar over time. However, the
length of each dollar bill is
proportional to its spending
power. Because our eyes focus
on the areas of the dollar-
shaped bars, this creates the
impression that the purchasing
power of the dollar diminished
even more than it really did. If
the area of the dollar were
drawn to reflect its purchasing
power, the 2005 dollar would be approximately twice as large as the one shown
in the graphic display.
18
Practise 33 (page 784) Describe what is misleading in the visual display of
data.
The sectors
representing
these six
countries use
up 100% of
the pie
graph, yet
the
percentages
for these six
The sizes of the books are countries
not scaled proportionally total only 57%. This may give the
in terms of the data they misleading impression that the U.S. has
represent. about 50% of the world’s computer use.
The sizes of the TV screens are not
scaled proportionally in terms of the
data they represent.
19
Section 12.2 Measures of Central Tendency
Objective 1 – Determine the mean for a data set.
Objective 2 – Determine the median for a data set.
Objective 3 – Determine the mode for a data set.
Objective 4 – Determine the midrange for a data set.
Key Terms
Measures of central tendency
Mean
Summary
Numbers that represent an average are known as measures of central tendency.
Four such measures are: the mean, the median, the mode and the midrange.
20
Are inventors born or made? Find the
mean percentage of adults in the ten
countries who agree that inventiveness
can be learned.
+77
Figure 12.6
CHECKPOINT 1 – p.787
Use Figure 12.6 (p. 787) to find the mean percentage of adults in the ten
countries who agree that inventiveness is inherited.
Mean
x
n
The mean percentage of adults in the ten countries who agree that
inventiveness is inherited is 20.1%.
21
22
p. 788
Key Terms
Median
Summary
Another type of measure of central tendency is called
the median, which is the middle number of a data set.
23
For example, consider the scores on the pop quiz: 8, 10, 9, 9, 6, 8, 7, and 9. First,
rearrange the scores in order from smallest to largest: 6, 7, 8, 8, 9, 9, 9, 10. Since
the number of data items is even, the median is the mean of the two middle data
items. The middle two data items are 8 and 9. The mean of these two numbers is:
∑𝑥 8+9 17
𝑥= = = = 8.5. Thus, the median quiz score is 8.5.
𝑛 2 2
CHECKPOINT 3 – p. 789
Find the median for each of the following groups of data:
a. 28, 42, 40, 25, 35
b. 72, 61, 85, 93, 79, 87
a. First arrange the data items from smallest to largest: 25, 28, 35, 40, 42
The number of data items is odd, so the median is the middle number. The median
is 35.
b. First arrange the data items from smallest to largest: 61, 72, 79, 85, 87, 93
The number of data items is even, so the median is the mean of the two middle data
items.
24
79 + 85 164
The median is = = 82 .
2 2
p. 791
25
Why such a big difference between these two measures of central
tendency? The relatively high annual salary of the section manager,
$95,000, pulls the mean salary to a value considerably higher than the
median salary. When one or more data items are much greater than the
other items, these extreme values can greatly influence the mean. In cases
like this, the median is often more representative of the data.
26
Check Point 5 – p. 791
27
Check Point 6 – p. 792
The total frequency is 111312 2 2 12 11 18, therefore n = 18
Therefore, the median is the mean of the data items in positions 9 and 10.
Counting through the frequency row identifies that the 9th data item is 54 and the
10th data item is 55.
28
Check Point 7 – p. 794
c. The mean is so much greater than the median because one data item, Trump’s
net worth, was much greater than the other presidents.
29
Objective 3 – Determine the mode for a data set.
Key Terms:
Mode
Summary
The third type of measure of central tendency is the mode.
For example, the scores on a pop quiz are: 8, 10, 9, 9, 6, 8, 7, and 9. The
score of 9 occurs 3 times. The mode of this data set is 9.
CHECKPOINT 8 – p. 795
Find the mode for each of the following groups of data:
a. 3, 8, 5, 8, 9, 10
b. 3, 8, 5, 8, 9, 3
c. 3, 8, 5, 6, 9, 10
30
Objective 4 – Determine the midrange for a data set.
31
CHECKPOINT 9 – p. 796
CHECKPOINT 10 – p. 796
32
Section 12.3 Measures of Dispersion
Objective 1 – Determine the range for a data set.
Objective 2 – Determine the standard deviation for a data set.
Key Terms
Range
Summary
34
Objective 2 – Determine the standard deviation for a data set.
Key Terms
Standard deviation
Summary
The standard deviation of a sample is symbolized by s, while the standard deviation of an entire
poplation is symbolized by σ (the lowercase Greek letter sigma).
35
36
The standard deviation for the four oldest U.S. presidents is approximately
2.16 years.
CHECKPOINT 2 & 3 - p. 802
Find the mean (ch.2) standard deviation (ch.3) for the following group of
data items from
2, 4, 7, 11. Round to two decimal places.
2 + 4 + 7 + 11 24
Mean = = =6
4 4
46 46
Standard deviation 3.92
4–1 3
37
CHECKPOINT 4 - p. 805
38
As the spread of data items increase, the standard deviation gets larger.
Although both samples have the same mean, the data items in sample B are more spread out.
Therefore, it has a greater standard deviation.
CHECKPOINT 5 - p. 805
39
Section 12.4 The Normal Distribution
Objective 1 – Recognize the characteristics of normal distributions.
Objective 2 – Understand the 68-95-99.7 Rule
Objective 3 – Find scores at a specified standard deviation from the mean.
Objective 4 – Use the 68-95-99.7 Rule
Objective 5 – Convert a data item to a z-score.
Objective 6 – Understand percentiles and quartiles.
Objective 7 – Use and interpret margins or error.
Objective 8 – Recognize distributions that are not normal.
The ends are lower and the middle bar is the highest.
40
As the sample size increases, the bars on either sides of the middle form a more smooth curve upward. Both sides of
the center become more similar to one another as well.
41
The shape of the normal distribution depends on the mean and the standard
deviation. Here are three normal distributions with the same mean but different
standard deviations. The highest point of all three graphs is in the middle. As the
standard deviation gets larger, the graph gets shorter and wider but retains its
symmetric bell shape. The normal distribution is used to make predictions about an
entire population using data from a sample.
42
Objective 2 – Understand the 68-95-99.7 Rule.
Key Terms
68-95-99.7 Rule
Standard deviation
Summary
The figure shows that very little data lies more than 3 standard deviations
above or below the mean. Additionally, as we move from the mean, the
curve falls rapidly and then more gradually toward the horizontal axis.
The tails of the curve approach, but never touch, the horizontal axis, and
the range of the normal distribution is infinite.
43
Objective 3 – Find scores at a specified standard deviation from the mean.
Key Terms
Standard deviation
Summary
To find a score above the mean by a specific standard deviation, multiply the given
standard deviation by the specified standard deviation, x, and add that product to
the mean. In algebraic terms,
score = mean + 𝒙 ⋅ standard deviation.
44
p. 810
Solution:
Key Terms
68-95-99.7 Rule
Summary
Figure 12.13 (p. 810) shows the distribution of male adult heights in North America is illustrated
as a normal distributed with a mean of 70 inches and a standard deviation of 4 inches.
45
p. 811.
a. The 68-95-99.7 Rule states that approximately 95% of the data items fall
within 2 standard deviations of the mean. The figure shows that 95% of male adults
have heights between 62 inches and 78 inches.
b. The 68-95-99.7 Rule states that approximately 95% of the data items fall within
2 standard deviations of the mean. Since the mean is 70 inches, the figure shows
that half of the 95%, or 47.5% of male adults have heights between 70 inches and
78 inches.
c. The 68-95-99.7 Rule states that approximately 68% of the data items fall within 1
standard deviation of the mean, thus 32% of the data falls outside this range. Half of
the 32%, or 16% of male adults will have heights above 74 inches
46
Objective 5 – Convert a data item to a z-score.
Key Terms
z-score
Summary
a. 342 days
b. 336 days
c. 333 days.
49
50
p. 814
51
Objective 6 – Understand percentiles and quartiles.
Key Terms
Percentiles First quartile Third quartile
Quartiles Second quartile Fourth quartile
Summary
A z-score measures a data item’s position in a normal distribution. Another measure of a data
item’s position is its percentile.
Percentiles are often associated with scores on standardized tests. If a score is in the
35th percentile, this means that 35% of the scores are less than this score. If a score
is in the 90th percentile, this indicates that 90% of the scores are less than this
score.
52
CHECKPOINT 6 – p. 815
A student scored in the 75th percentile on the SAT. What does this mean?
This means that 75% of the scores on the SAT are less than this student’s
score.
53
Objective 7 – Use and interpret margins or error.
Key Terms
Margin of error
Summary
For example, a random sample of 800 students were surveyed and 62%
of students said they struggle with showing up to class on time. The
margin of error for this survey is
1 1
100% 100% 0.035 100% 3.5%.
n 800
There is a 95% probability that the true population percentage lies
1
between the sample percent − × 100% = 62% − 3.5% = 58.5%
√𝑛
1
100% 62% 3.5% 65.5%.
and the sample percent n
54
We can be 95% confident that between 58.5% and 65.5% of all students
struggle with showing up to class on time.
55
CHECKPOINT 7 – p. 817
56
Objective 8 – Recognize distributions that are not normal.
Key Terms
Skewed
Skewed to the right
Skewered to the left
Summary
Although the normal distribution is the most important of all distributions in terms
of analyzing data, not all data can be approximated by this symmetric distribution
with its mean, median, and mode all having the same value.
A distribution of data is skewed if a large number of data items are piled up at one
end or the other, with a “tail” at the opposite end. In the distribution of weekly
earnings in Figure 12.22, the tail is to the right. Such a distribution is said to be
skewed to the right. By contrast to the distribution of weekly earnings, the
distribution in Figure 12.23 has more data items at the high end of the scale than at
the low end. The tail of this distribution is to the left and the distribution is said to
be skewed to the left.
Skewed to the right
57
Figure 12.21 (p. 818) is a histogram representing frequencies of the ages
of women interviewed by Kinsey and his associates. Is the shape of this
distribution best classified as normal, skewed to the right, or skewed to
the left? Explain.
58
Practices 73 a. The histogram shows murder rates per 100,000 residents
and the number of U.S. states that had these rates for a recent year. Is the
shape of this distribution best classified as normal, skewed to the right, or
skewed to the left?
59
Section 12.5 Problem Solving with the Normal Distribution
Key Terms
z-score
Percentile
Summary:
A Z-Score Table, is a table that shows the percentage of values (or area percentage) to the left of
a given z-score on a standard normal distribution.
60
61
p. 823
p. 824
62
63
64
p. 825
65
Section 12.6 Scatter Plots, Correlation, and Regression Lines
Objective 1 – Make a scatter plot for a table of data items.
Objective 2 – Interpret information given in a scatter plot.
Objective 3 – Compute the correlation coefficient.
Objective 4 – Write the equation of the regression line.
Objective 5 – Use a sample’s correlation coefficient to determine whether there is a correlation in the
population.
Key Terms
Scatter plot
Summary
When two data items are collected for every person or object in a sample,
the data items can be visually displayed using a scatter plot.
66
67
There are at least 3 possible explanations:
Establishing that one thing causes another is extremely difficult, even if there is a
strong correlation between these things.
For example, as the temperature increases, there is an increase in the number of
people stung by jellyfish at the beach. This does not mean that an increase in the
temperature causes more people to be stung. It might mean, that because it is hotter,
more people go in the water. With an increased number of swimmers, more people
are likely to be stung by the jellyfish.
68
Objective 2 – Interpret information given in a scatter plot.
Key Terms
Regression line
Correlation
Positively correlated
Negatively correlated
No correlation
Correlation coefficient
Summary
This figure shows the scatter plot for the education-prejudice data. There is also a straight line that
seems to approximately “fit” the data points. Most of the data points lie either near or on this line.
So,
A scatter plot like the one above can be used to determine whether two
quantities are related.
69
Figure “a” shows a value of 1. This indicates a perfect positive
correlation.
Figure “g” shows a value of -1. This indicates a perfect negative
correlation.
Figures “b” and “c” are positively correlated but not perfectly. An
increase in one variable tends to be accompanied by an increase in the
other.
Figures “e” and “f” are negatively correlated but not perfectly. An
increase in one variable tends to be accompanied by a decrease in the
other.
Figure “d” shows that there is no correlation between the two variables.
70
71
72
Objective 3 – Compute the correlation coefficient.
Key Terms
Correlation coefficient
Summary
When computing the correlation coefficient by hand, organize
your work in five columns: x, y, xy, x2, and y2. Find the sum of
the numbers in each column. Then, substitute these values into
the formula for r.
73
74
75
76
Objective 4 – Write the equation of the regression line.
Key Terms
Regression line
Summary
77
78
79
80
10(2099.2) − (359)(44.3) 5088.3
𝑚= = ≈ 0.1
10(17,983) − 128,881 50949
y = 0.1x + 0.8
= 0.1(80) + 0.8
= 8.8
The death rate would be 8.8 per 100,000 people.
81
Objective 5 – Use a sample’s correlation coefficient to determine whether there is a
correlation in the population.
Key Terms
The Level of Significance of r
Summary
These values are shown in the second and third columns of Table 12.18
(p. 834). They depend on the sample size, n, listed in the left column. If
|r| , the absolute value of the correlation coefficient computed for the
sample, is greater than the value given in the table, a correlation exists
between the variables in the population.
82
83
Yes, |𝑟| = 0.89. Since 0.89 > 0.632 and 0.765 (using Table 12.18, p.
834), we may conclude that a correlation does exist.
84
85