0% found this document useful (0 votes)
12 views85 pages

Sampling Techniques and Data Presentation

Uploaded by

hys25jfmnt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views85 pages

Sampling Techniques and Data Presentation

Uploaded by

hys25jfmnt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Section 12.

1 Sampling, Frequency Distributions, and Graphs


Objective 1 – Describe the population whose properties are to be analyzed.
Objective 2 – Select an appropriate sampling technique.
Objective 3 – Organize and present data.
Objective 4 – Identify deceptions in visual displays of data.

Objective 1 – Describe the population whose properties are to be analyzed.

Key Terms
Population
Sample
Summary

Descriptive Statistics: Betimleyici istatistik


Inferential Statistics: Çıkarımsal istatistik

1
For example, board members of a homeowner’s association want to
survey the homeowners to get their opinions about new paint colors for
common walls.
The population is the set containing all the homeowners in the
neighborhood.
Example, A city council of a small city needs to know whether its
residents will support a renovation of the city park. Describe the
population.
The population is the set containing all residents of the city.

2
CHECKPOINT 1 - (p. 773)
A city government wants to conduct a survey among the city’s
homeless to discover their opinions about required residence in
city shelters from midnight until 6 a.m.

a) Describe the population.


b) A city commissioner suggests obtaining a sample by
surveying all the homeless people at the city’s largest
shelter on a Sunday night. Does this seem like a good idea?
Explain your answer.
Solution
a. The population is the set containing all the of the city’s
homeless people.
b. This is not a good idea. This sample of people currently in a
shelter is more likely to hold opinions that favor required
residence in city shelters than the population of all the city’s
homeless.

3
Objective 2 – Select an appropriate sampling technique.
Key Terms
Random sample, Representative sample

Summary

For example, consider the scenario about board members of a


homeowner’s association who want to survey the homeowners to get
their opinions about new paint colors for common walls.
To obtain a random sample, the board can assign each house a number
say, 1-250, if there were 250 houses in the neighborhood. They can
randomly choose 25 numbers, then gather the opinions of homeowners of
these properties. The sample is representative of the population since
they people surveyed are homeowners within the neighborhood.

4
5
CHECKPOINT 2 - p. 775

A city government wants to conduct a survey among the city’s


homeless to discover their opinions about required residence in city
shelters from midnight until 6 a.m.

“A city commissioner suggests obtaining a sample by surveying all


the homeless people at the city’s largest shelter on a Sunday night.”

Explain why the sampling technique described above is not a random


sample. Then describe an appropriate way to select a random sample
of the city’s homeless.

The sampling technique described here does not produce a random


sample, because homeless people who do not go to shelters have no
chance of being selected for the survey. In this instance, an
appropriate method would be to randomly select neighborhoods of
the city and then randomly survey homeless people within the
selected neighborhood.
Practice (Exercises 12.1 p. 782)
The government of a large city needs to determine whether the city’s
residents will support the construction of a new jail. The government
decides to conduct a survey of a sample of the city’s residents.
Which one of the following procedures would be most appropriate
for obtaining a sample of the city’s residents?
a. Survey a random sample of the employees and inmates at the
old jail.
b. Survey every fifth person who walks into City Hall on a given
day.
c. Survey a random sample of persons within each geographic
region of the city.
d. Survey the first 200 people listed in the city’s telephone
directory.

6
Objective 3 – Organize and present data.

Key Terms
Data item
Data value
Frequency distribution
Frequency

Summary
After data have been collected from a sample of the population, the next
task is to present the data in a condensed and manageable form, so the
data can be more easily interpreted.

7
8
9
10
11
12
13
Histogram A histogram with a superimposed frequency polygon

Frequency Polygon

14
15
Objective 4 – Identify deceptions in visual displays of data.

Key Terms
None
Summary

Visual displays of data can be deceiving. Graphs can be used to


distort the underlying data by choosing scales that can inflate or
deflate a trend.

16
Six Things to Watch for in Visual Displays of Data
1. Is there a title that explains what is being displayed?
2. Are numbers lined up with tick marks on the vertical axis that clearly indicate
the scale? Has the scale been varied to create a more or less dramatic impression
than shown by the actual data?
3. Do too many design and cosmetic effects draw attention from or distort the
data?
4. Has the wrong impression been created about how the data are changing
because equally spaced time intervals are not used on the horizontal axis?
Furthermore, has a time interval been chosen that allows the data to be interpreted
in various ways?
5. Are bar sizes scaled proportionately in terms of the data they represent?
6. Is there a source that indicates where the data in the display came from? Do
the data come from an entire population or a sample? Was a random sample used
and, if so, are there possible differences between what is displayed in the graph and
what is occurring in the entire population? Who is presenting the visual display, and
does that person have a special case to make for or against the trend shown by the
graph?

17
Guided Example
What are the misleading visual displays within the graphic display?
The visual display of the dollar
bills varies in both length and
width of the bills to show
diminishing power of the
dollar over time. However, the
length of each dollar bill is
proportional to its spending
power. Because our eyes focus
on the areas of the dollar-
shaped bars, this creates the
impression that the purchasing
power of the dollar diminished
even more than it really did. If
the area of the dollar were
drawn to reflect its purchasing
power, the 2005 dollar would be approximately twice as large as the one shown
in the graphic display.

Cosmetic effects of homes


with equal heights, but
different
frontal additions and shadow
lengths, make it impossible to
tell
if they proportionately depict
the given areas. Time
intervals
on the horizontal axis are not
uniform in size, making it
appear
that dwelling swelling has
been linear from 1980 through
2010. The data indicate that this is not the case. There was a
greater increase in area from 1980 through 1990, averaging
34 square feet per year, than from 1990 through 2010, averaging
15.6 square feet per year.

18
Practise 33 (page 784) Describe what is misleading in the visual display of
data.

The bars on the horizontal axis are


evenly spaced, yet the time intervals
that they represent vary greatly. This
may give
the misleading impression of linear
growth.

The sectors
representing
these six
countries use
up 100% of
the pie
graph, yet
the
percentages
for these six
The sizes of the books are countries
not scaled proportionally total only 57%. This may give the
in terms of the data they misleading impression that the U.S. has
represent. about 50% of the world’s computer use.
The sizes of the TV screens are not
scaled proportionally in terms of the
data they represent.

Each film’s star extends above the bar


giving a misimpression of the data
represented.

19
Section 12.2 Measures of Central Tendency
Objective 1 – Determine the mean for a data set.
Objective 2 – Determine the median for a data set.
Objective 3 – Determine the mode for a data set.
Objective 4 – Determine the midrange for a data set.

Objective 1 – Determine the mean for a data set.

Key Terms
Measures of central tendency
Mean

Summary
Numbers that represent an average are known as measures of central tendency.
Four such measures are: the mean, the median, the mode and the midrange.

20
Are inventors born or made? Find the
mean percentage of adults in the ten
countries who agree that inventiveness
can be learned.

+77

Figure 12.6

CHECKPOINT 1 – p.787

Use Figure 12.6 (p. 787) to find the mean percentage of adults in the ten
countries who agree that inventiveness is inherited.

Mean 
 x
n

The mean percentage of adults in the ten countries who agree that
inventiveness is inherited is 20.1%.

21
22
p. 788

Objective 2 – Determine the median for a data set.

Key Terms
Median

Summary
Another type of measure of central tendency is called
the median, which is the middle number of a data set.

23
For example, consider the scores on the pop quiz: 8, 10, 9, 9, 6, 8, 7, and 9. First,
rearrange the scores in order from smallest to largest: 6, 7, 8, 8, 9, 9, 9, 10. Since
the number of data items is even, the median is the mean of the two middle data
items. The middle two data items are 8 and 9. The mean of these two numbers is:
∑𝑥 8+9 17
𝑥= = = = 8.5. Thus, the median quiz score is 8.5.
𝑛 2 2

CHECKPOINT 3 – p. 789
Find the median for each of the following groups of data:
a. 28, 42, 40, 25, 35
b. 72, 61, 85, 93, 79, 87

a. First arrange the data items from smallest to largest: 25, 28, 35, 40, 42
The number of data items is odd, so the median is the middle number. The median
is 35.
b. First arrange the data items from smallest to largest: 61, 72, 79, 85, 87, 93
The number of data items is even, so the median is the mean of the two middle data
items.
24
79 + 85 164
The median is = = 82 .
2 2

p. 791

The data items are arranged from smallest to largest


with n = 19, which gives

The median is in the 10th position, which means


the median is 5.

25
Why such a big difference between these two measures of central
tendency? The relatively high annual salary of the section manager,
$95,000, pulls the mean salary to a value considerably higher than the
median salary. When one or more data items are much greater than the
other items, these extreme values can greatly influence the mean. In cases
like this, the median is often more representative of the data.

26
Check Point 5 – p. 791

27
Check Point 6 – p. 792

The total frequency is 111312 2 2 12 11 18, therefore n = 18

The median’s position is

Therefore, the median is the mean of the data items in positions 9 and 10.
Counting through the frequency row identifies that the 9th data item is 54 and the
10th data item is 55.

Thus, the median is

28
Check Point 7 – p. 794

c. The mean is so much greater than the median because one data item, Trump’s
net worth, was much greater than the other presidents.

29
Objective 3 – Determine the mode for a data set.

Key Terms:
Mode
Summary
The third type of measure of central tendency is the mode.

For example, the scores on a pop quiz are: 8, 10, 9, 9, 6, 8, 7, and 9. The
score of 9 occurs 3 times. The mode of this data set is 9.

However, if the scores on a pop quiz are: 8, 10, 9, 9, 6, 8, 7, and 5, then


there are two modes of 8 and 9 since they occur twice.

Finally, if the scores on a pop quiz are: 8, 10, 9, 4, 6, 3, 7, and 5, then


there is no mode since no data item occurs more than once.

CHECKPOINT 8 – p. 795
Find the mode for each of the following groups of data:
a. 3, 8, 5, 8, 9, 10
b. 3, 8, 5, 8, 9, 3
c. 3, 8, 5, 6, 9, 10

a. The mode is 8 (because 8 occurs most often).


b. The modes are 3 and 8 (because both 3 and 8 occur most often).
c. There is no mode (because each data item occurs the same number of
times).

30
Objective 4 – Determine the midrange for a data set.

Key Terms: Midrange


Summary:

For example, find the midrange of scores on a pop quiz: 8, 10, 9,


9, 6, 8, 7, and 9.
The lowest score is 6 and the highest score is 10. The midrange
lowest data value + highest data value 6  10 16
is, Midrange  2

2

2
 8.

The midrange of quiz scores is 8.

31
CHECKPOINT 9 – p. 796

CHECKPOINT 10 – p. 796

32
Section 12.3 Measures of Dispersion
Objective 1 – Determine the range for a data set.
Objective 2 – Determine the standard deviation for a data set.

Objective 1 – Determine the range for a data set.

Key Terms
Range

Summary

31.7 ◦c– 16.1 ◦c = 15.6 ◦c

For example, find the range of scores on a pop quiz: 8, 10, 9, 9, 6, 8, 7,


and 9. The highest data value is 10 and the lowest data value is 6.
The range is, 10 – 6 = 4.
33
CHECKPOINT – p. 801
Find the range for the following group of data items:
4, 2, 11, 7
Range 11 – 2 = 9.

34
Objective 2 – Determine the standard deviation for a data set.

Key Terms
Standard deviation
Summary

The standard deviation of a sample is symbolized by s, while the standard deviation of an entire
poplation is symbolized by σ (the lowercase Greek letter sigma).

35
36
The standard deviation for the four oldest U.S. presidents is approximately
2.16 years.
CHECKPOINT 2 & 3 - p. 802
Find the mean (ch.2) standard deviation (ch.3) for the following group of
data items from
2, 4, 7, 11. Round to two decimal places.

2 + 4 + 7 + 11 24
Mean = = =6
4 4

46 46
Standard deviation    3.92
4–1 3

37
CHECKPOINT 4 - p. 805

38
As the spread of data items increase, the standard deviation gets larger.
Although both samples have the same mean, the data items in sample B are more spread out.
Therefore, it has a greater standard deviation.

CHECKPOINT 5 - p. 805

39
Section 12.4 The Normal Distribution
Objective 1 – Recognize the characteristics of normal distributions.
Objective 2 – Understand the 68-95-99.7 Rule
Objective 3 – Find scores at a specified standard deviation from the mean.
Objective 4 – Use the 68-95-99.7 Rule
Objective 5 – Convert a data item to a z-score.
Objective 6 – Understand percentiles and quartiles.
Objective 7 – Use and interpret margins or error.
Objective 8 – Recognize distributions that are not normal.

Objective 1 – Recognize characteristics of normal distributions.


Key Terms
Normal distribution, Symmetrical, Bell curve, Standard deviation
Summary:

The ends are lower and the middle bar is the highest.

40
As the sample size increases, the bars on either sides of the middle form a more smooth curve upward. Both sides of
the center become more similar to one another as well.

41
The shape of the normal distribution depends on the mean and the standard
deviation. Here are three normal distributions with the same mean but different
standard deviations. The highest point of all three graphs is in the middle. As the
standard deviation gets larger, the graph gets shorter and wider but retains its
symmetric bell shape. The normal distribution is used to make predictions about an
entire population using data from a sample.

42
Objective 2 – Understand the 68-95-99.7 Rule.

Key Terms
68-95-99.7 Rule
Standard deviation
Summary

The standard deviation plays a crucial role in the normal distribution,


summarized by the 68-95-99.7 Rule.

1. Approximately 68% of the


data items fall within 1
standard deviation of the
mean (in both directions).
2. Approximately 95% of the
data items fall within 2
standard deviations of the
mean.
3. Approximately 99.7% of the
data items fall within 3
standard deviations of the
mean.

The figure shows that very little data lies more than 3 standard deviations
above or below the mean. Additionally, as we move from the mean, the
curve falls rapidly and then more gradually toward the horizontal axis.
The tails of the curve approach, but never touch, the horizontal axis, and
the range of the normal distribution is infinite.

43
Objective 3 – Find scores at a specified standard deviation from the mean.

Key Terms
Standard deviation

Summary
To find a score above the mean by a specific standard deviation, multiply the given
standard deviation by the specified standard deviation, x, and add that product to
the mean. In algebraic terms,
score = mean + 𝒙 ⋅ standard deviation.

66 and 74 are 1 standard deviation out.


62 and 78 are 2 std. dev.s out.
58 and 82 are 3 std. dev.s out.

44
p. 810

Solution:

Objective 4 – Use the 68-95-99.7 Rule.

Key Terms
68-95-99.7 Rule

Summary
Figure 12.13 (p. 810) shows the distribution of male adult heights in North America is illustrated
as a normal distributed with a mean of 70 inches and a standard deviation of 4 inches.

45
p. 811.

a. The 68-95-99.7 Rule states that approximately 95% of the data items fall
within 2 standard deviations of the mean. The figure shows that 95% of male adults
have heights between 62 inches and 78 inches.

b. The 68-95-99.7 Rule states that approximately 95% of the data items fall within
2 standard deviations of the mean. Since the mean is 70 inches, the figure shows
that half of the 95%, or 47.5% of male adults have heights between 70 inches and
78 inches.

c. The 68-95-99.7 Rule states that approximately 68% of the data items fall within 1
standard deviation of the mean, thus 32% of the data falls outside this range. Half of
the 32%, or 16% of male adults will have heights above 74 inches

46
Objective 5 – Convert a data item to a z-score.

Key Terms
z-score
Summary

Example: Suppose a set of data is normally distributed with a mean of


100 and a standard deviation of 8. Convert each data item into a z-score.
data item  mean 120 100 20
a. 120 z -score     2.5
standard deviation 8 8

data item  mean 90 100 10


b. 90 z -score     1.25
standard deviation 8 8

data item  mean 100 100 0


c. 100 z -score    0
standard deviation 8 8
47
48
CHECKPOINT 3 – p. 813
The length of horse pregnancies from conception to birth is normally
distributed with a mean of 336 days and a standard deviation of 3 days.
Find the z-score for a horse pregnancy of

a. 342 days
b. 336 days
c. 333 days.

data item  mean 342  336 6


z
a. 342    2
standard deviation 3 3

data item  mean 336  336 0


b. z336    0
standard deviation 3 3

data item  mean 333  336 3


c. z333     1
standard deviation 3 3

49
50
p. 814

51
Objective 6 – Understand percentiles and quartiles.

Key Terms
Percentiles First quartile Third quartile
Quartiles Second quartile Fourth quartile

Summary
A z-score measures a data item’s position in a normal distribution. Another measure of a data
item’s position is its percentile.

Percentiles and Quartiles

Percentiles are often associated with scores on standardized tests. If a score is in the
35th percentile, this means that 35% of the scores are less than this score. If a score
is in the 90th percentile, this indicates that 90% of the scores are less than this
score.

52
CHECKPOINT 6 – p. 815
A student scored in the 75th percentile on the SAT. What does this mean?
This means that 75% of the scores on the SAT are less than this student’s
score.

53
Objective 7 – Use and interpret margins or error.

Key Terms
Margin of error

Summary

Hata payı ne demek?


Hata payı, diğer adıyla güven aralığı, anket sonuçlarınızın genel popülasyonun
görüşlerini ne kadar yansıtmasını bekleyebileceğinizi gösterir. ... Hata payı,
adından da anlaşılacağı gibi bir anketten elde edilen asıl sonuçların üstünde ve
altında kalan değer aralığıdır.

For example, a random sample of 800 students were surveyed and 62%
of students said they struggle with showing up to class on time. The
margin of error for this survey is
1 1
  100%    100%  0.035  100%  3.5%.
n 800
There is a 95% probability that the true population percentage lies
1
between the sample percent − × 100% = 62% − 3.5% = 58.5%
√𝑛
1
  100%  62%  3.5%  65.5%.
and the sample percent n

54
We can be 95% confident that between 58.5% and 65.5% of all students
struggle with showing up to class on time.

55
CHECKPOINT 7 – p. 817

A Harris Poll of 2513 U.S. adults ages 18


and older asked the question How many
books do you typically read in a year? The
results of the poll are shown in Figure 12.20
(p. 817).
a. Find the margin of error for this survey.
Round to the nearest tenth of a percent.
b. Write a statement about the percentage of
U.S. adults who read more than ten books per
year.

n = 2513. The margin of error is


a. The sample size is
1 1
± ´ 100%    100%
n 2513
 0.020  100%  2.0%
b. There is a 95% probability that the true population percentage lies between
1
the sample percent  100%
n

 36%  2.0%  34% and


1
the sample percent  100%
n
= 36% + 2.0% = 38%
We can be 95% confident that between 34% and 38% of Americans read more than ten
books per year.

56
Objective 8 – Recognize distributions that are not normal.
Key Terms
Skewed
Skewed to the right
Skewered to the left
Summary
Although the normal distribution is the most important of all distributions in terms
of analyzing data, not all data can be approximated by this symmetric distribution
with its mean, median, and mode all having the same value.

A distribution of data is skewed if a large number of data items are piled up at one
end or the other, with a “tail” at the opposite end. In the distribution of weekly
earnings in Figure 12.22, the tail is to the right. Such a distribution is said to be
skewed to the right. By contrast to the distribution of weekly earnings, the
distribution in Figure 12.23 has more data items at the high end of the scale than at
the low end. The tail of this distribution is to the left and the distribution is said to
be skewed to the left.
Skewed to the right

Skewed to the left

57
Figure 12.21 (p. 818) is a histogram representing frequencies of the ages
of women interviewed by Kinsey and his associates. Is the shape of this
distribution best classified as normal, skewed to the right, or skewed to
the left? Explain.

The data is skewed to the right as the tail is to the right.

58
Practices 73 a. The histogram shows murder rates per 100,000 residents
and the number of U.S. states that had these rates for a recent year. Is the
shape of this distribution best classified as normal, skewed to the right, or
skewed to the left?

The graph is skewed to the right

59
Section 12.5 Problem Solving with the Normal Distribution

Objective 1 – Solve applied problems involving normal distributions.

Key Terms
z-score
Percentile

Summary:
A Z-Score Table, is a table that shows the percentage of values (or area percentage) to the left of
a given z-score on a standard normal distribution.

 In a normal distribution, the mean, median,


and mode all have a corresponding z-score of 0.
 Table 12.16 (p.822) shows that the
percentile for a z-score of 0 is 50.00.
 Thus, 50% of the data items in a normal
distribution are less than the mean, median, and
mode.
 Consequently, 50% of the data items are
greater than or equal to the mean, median, and
mode.

60
61
p. 823

p. 824

62
63
64
p. 825

65
Section 12.6 Scatter Plots, Correlation, and Regression Lines
Objective 1 – Make a scatter plot for a table of data items.
Objective 2 – Interpret information given in a scatter plot.
Objective 3 – Compute the correlation coefficient.
Objective 4 – Write the equation of the regression line.
Objective 5 – Use a sample’s correlation coefficient to determine whether there is a correlation in the
population.

Objective 1 – Make a scatter plot for a table of data items.

Key Terms
Scatter plot

Summary
When two data items are collected for every person or object in a sample,
the data items can be visually displayed using a scatter plot.

66
67
There are at least 3 possible explanations:

Establishing that one thing causes another is extremely difficult, even if there is a
strong correlation between these things.
For example, as the temperature increases, there is an increase in the number of
people stung by jellyfish at the beach. This does not mean that an increase in the
temperature causes more people to be stung. It might mean, that because it is hotter,
more people go in the water. With an increased number of swimmers, more people
are likely to be stung by the jellyfish.

68
Objective 2 – Interpret information given in a scatter plot.

Key Terms
Regression line
Correlation
Positively correlated
Negatively correlated
No correlation
Correlation coefficient

Summary
This figure shows the scatter plot for the education-prejudice data. There is also a straight line that
seems to approximately “fit” the data points. Most of the data points lie either near or on this line.
So,

A scatter plot like the one above can be used to determine whether two
quantities are related.

69
Figure “a” shows a value of 1. This indicates a perfect positive
correlation.
Figure “g” shows a value of -1. This indicates a perfect negative
correlation.
Figures “b” and “c” are positively correlated but not perfectly. An
increase in one variable tends to be accompanied by an increase in the
other.
Figures “e” and “f” are negatively correlated but not perfectly. An
increase in one variable tends to be accompanied by a decrease in the
other.
Figure “d” shows that there is no correlation between the two variables.

70
71
72
Objective 3 – Compute the correlation coefficient.

Key Terms
Correlation coefficient

Summary
When computing the correlation coefficient by hand, organize
your work in five columns: x, y, xy, x2, and y2. Find the sum of
the numbers in each column. Then, substitute these values into
the formula for r.

73
74
75
76
Objective 4 – Write the equation of the regression line.

Key Terms
Regression line

Summary

77
78
79
80
10(2099.2) − (359)(44.3) 5088.3
𝑚= = ≈ 0.1
10(17,983) − 128,881 50949

44.3 − (0.1)(359) 8.4


𝑏= = ≈ 0.8
10 10

The equation of the regression line is 𝑦 = 0.1𝑥 + 0.8.

The predicted rate in a country with 80 firearms per 100 persons


can be found by substituting 80 for x.

y = 0.1x + 0.8
= 0.1(80) + 0.8
= 8.8
The death rate would be 8.8 per 100,000 people.

81
Objective 5 – Use a sample’s correlation coefficient to determine whether there is a
correlation in the population.

Key Terms
The Level of Significance of r

Summary

These values are shown in the second and third columns of Table 12.18
(p. 834). They depend on the sample size, n, listed in the left column. If
|r| , the absolute value of the correlation coefficient computed for the
sample, is greater than the value given in the table, a correlation exists
between the variables in the population.

82
83
Yes, |𝑟| = 0.89. Since 0.89 > 0.632 and 0.765 (using Table 12.18, p.
834), we may conclude that a correlation does exist.

84
85

You might also like