Lecture and Tutorial Answers
Lecture and Tutorial Answers
Challenge: page 31
2399
0.5219
4597
96
0.3967
242
96
0.0437
2198
(b)
945
0.2056
4597
In tables designed to convey information quickly and easily, include row or column
2.
averages if appropriate.
l.
If a plot of numeric data has a long right tail we say the data are positively skewed.
p.
An outlier should be removed from the data set only if it is found to be a mistake.
r.
For highly skewed data, the sample median is a more sensible measure of the centre than
the sample mean
u.
About 75% of the observations have a value less than the upper quartile.
(a)
Under 40
Alternative: About 25% of the observations have a value greater than the upper quartile.
v.
(c)
(iii)
bb. It is possible to explore more than two variables at a time, by using techniques such as
colour gradients and subsetting.
Total
Mild cases
16
20
36
Serious cases
15
35
50
Total
31
55
86
16
0.1860
86
(b)
(i)
(c)
15
0.3
50
40 or over
(ii)
(d)
20 35 15
0.8140
86
20
0.3636
55
(iii)
35
0.4070
86
3. (a)
(c)
2.
9.
(a)
3.
(a)
4.
(e)
5.
(c)
6.
(e)
7.
(a)
8.
(a)
(b)
10. (d)
11. (c)
12. (d)
13. (e)
14. (c)
15. (c)
16. (c)
17. (a)
18. (b)
19. (b)
20. (d)
21. (a)
22. (c)
23. (c)
24. (a)
25. (a)
26. (e)
Tutorial: Pages 40 to 43
(a)
Scatter plot
(b)
Side-by-side plot on the same scale. (Dot plot, box plot, histogram.)
(c)
Bar graphs of the three countries, showing proportions for each level of Inversions.
(d)
2.
Total
300
500
Not in default
2185
9500
Total
2385
7615
10000
Total
23.85%
(c)
200
0.0839
2385
(d)
300
0.0394
7615
(e)
200
300 0.0839
2.129
2385 7615 0.0394
4.
Drink Daily
Side-by-side plot on the same scale. (Dot plot, box plot, histogram.)
(2)
Low-risk
(b)
High-risk
40% of 500 = 200
In default
Male drinker
4131
5100
Female Drinker
4410
4900
Total
1459
8541
10000
490
0.3358
1459
Page 1
Chapter 2
Tutorial: Page 9
Challenge: page 7
1.
(i)
Media reports which describe two variables as being linked or associated cannot be
Study 3: We are comparing the old style and new style of television commercials.
interpreted as a change in one variable will cause a change in the other variable.
h.
(ii)
If one of the treatment levels in an experiment involving people is a placebo, then blinding
should be used.
2. (e)
Study 1: To make the comparison, we are measuring the survival of the cats.
Study 2: To make the comparison, we are measuring the clothing expenditure for the
next 3 months.
Study 1: We are comparing cats that fell from 1 or 2 storeys, cats that fell from 3 5
storeys and cats that fell from 6 or more storeys.
Study 3: To make the comparison, we are measuring the recall scores for the
commercials.
3. (b)
(iii) Study 1: An observational study. There is no allocation (by the researcher) of subjects
(cats) to the number of storeys of the fall. Results are simply observed for cases that
happen.
Study 2: An observational study. There is no allocation (by the researcher) of subjects
(students) to the groups (male or female).
Study 3: An experiment. The researcher allocates which commercial is to be watched
by each subject (shopper).
(iv) It is not possible to do an experiment for study 1 due to ethical and moral
considerations. To do an experiment a sample of cats would have to be allocated a
height and then thrown out of a window at that height.
It is not possible to do an experiment for study 2 as the researcher cannot allocate a
gender to a student.
2.
(2)
False.
The corrected version of the false statement is:
Random allocation of treatments to subjects does not guarantee comparability of
treatment groups, even when we only have small numbers of subjects.
Page 2
Chapter 3
3.
(a)
(i)
Blinding was used in this study the technician doing the cleaning and assessing
the results was not aware of which version of oven cleaner was used. (The ovens
were also unaware of which cleaner was used.)
(ii)
Challenge: page 13
The corrected version of the false statement is:
i.
When the tail proportion is large then the observed difference is not unusual when chance
(iii) There was no blocking in this study. (No other factors apart from version of oven
is acting alone therefore chance COULD have been acting alone. All we can say is that we
have no evidence against chance was acting alone but this cannot be interpreted as
chance was acting alone.
(b)
The results of this study can be used to establish that any difference in mean
effectiveness score between the two versions of oven cleaner was caused by the
differences in the oven cleaners as there was random allocation of ovens to the two
treatment groups (current and new versions of oven cleaner).
(c)
Our tail probability of 0.17 means the observed difference is not unusual when chance
is acting alone, therefore chance COULD be acting alone.
(d)
We have no evidence that the new version of the oven cleaner gives higher
effectiveness scores, on average, than the current version.
(e)
We cannot conclude that the new version of the oven cleaner has the same
effectiveness scores, on average, as the current version - chance COULD be acting
alone OR something else as well as chance COULD also be acting (we dont have
enough information to determine which one of these two possibilities applies).
(a)
Blinding could be used in this study, if the male patients did not know the type of
surgery they had.
(b)
The Re-randomised data plot shows one of the 1000 re-randomisations under
chance alone. In this re-randomisation the diiference between the re-randomised
20
22
proportions, under chance alone, is
0.008 .
228 231
(c)
Our tail proportion of 0.069 means that, when chance is acting alone, a difference
between the group proportions of 0.045 or more is highly unlikely. In the actual study,
an observed difference between the two proportions of 0.045 or more would have been
highly unlikely if chance had been acting alone, therefore we are pretty sure that
chance was not acting alone in the actual study.
(d)
We can conclude that the type of surgery had an impact on whether the male patient
had a major event (death, heart attack or stroke) within 2 years of their surgery. We
are pretty sure that chance was not acting alone in the study and the study is a
well-designed experiment in which the male patients were randomly allocated to one
of the two types of surgery so a causal claim may be made.
2. (b)
3. (e)
4. (c)
Tutorial: Pages 15 to 17
1.
In a randomisation test we are assessing the plausibility of the explanation that an observed
difference between two groups is solely due to chance, i.e., is due to chance acting alone
We can say that the chance alone explanation is implausible if our observed difference is
unusual when chance is acting alone.
In a randomisation test, we randomly re-assign each unit to a group and, with only chance
acting, we calculate the difference between the two groups.
We repeat this re-randomisation a large number of times, and plot all of the differences to
get the re-randomisation distribution. We can then see whether our observed difference
is unlikely under chance alone.
(a)
The tail proportion is the proportion of times we get a difference at least as big as our
observed difference when chance is acting alone.
(b)
(c)
4.
When the tail proportion in the re-randomisation distribution is less than 5% then:
the observed difference is not unusual when chance is acting alone, therefore
chance COULD be acting alone
Page 3
Chapter 4
Challenge: Page 7
The corrected versions of the false statements are:
d. Sampling errors are smaller in larger samples than in smaller samples.
j. This is a common misconception. The size of the sampling error is not dependent on the
size of the population.
n. Using random sampling in statistical surveys does not guarantee that each sample(s) will be
representative but it ensures that in the long run over repeated samples of data, the
samples will, on average, be representative.
2. (a)
3. (c)
4. (e)
Tutorial: Page 9
1.
(4)
2.
(3)
The Hobbit, Wild Swans, The Power of One, April Fools Day.
3.
(3)
False.
The corrected version of the false statement is:
It is unlikely that sophisticated sampling projections can correct the results if the
population you are sampling from is different to the one of interest.
4.
(5)
Page 4
Chapter 5
6.
(a)
The parameter we are interested in is the proportion of New Zealand adults who, in
2015, had trust and confidence in Members of Parliament.
(b)
0.25
(c)
The blue vertical lines in the re-sample plot represent 1000 re-sample proportions
Challenge: Page 13
The corrected version of the false statements is:
d
taken with replacement from the original sample. They show the extent of the variation
population.
2. (d)
3. (d)
4. (e)
7.
(a)
5. (c)
(b)
(c)
It is a fairly safe bet that the proportion of New Zealand adults who, in 2015, had trust
and confidence in Members of Parliament is somewhere between 2 and 12 percentage
points higher than the corresponding proportion in 2013.
(d)
Tutorial: Pages 16 to 18
1.
It is a fairly safe bet that the proportion of New Zealand adults who, in 2015, had trust
and confidence in Members of Parliament is somewhere between 0.21 and 0.28.
The lower and upper limits of the bootstrap confidence interval were obtained by not
including the bottom and top 2.5% re-sample proportions.
The process of using sample data to try and make useful statements about an unknown
parameter is called sample-to-population inference.
2.
We form intervals of believable values rather than just stating individual estimates to give
an indication of the level of uncertainty in the estimate.
3.
(a)
The parameter we are interested in is Female , the mean number of text messages sent
per day by females for the population.
4.
5.
(b)
(c)
(d)
The re-samples are randomly selected from the original sample with replacement using
the same sample size as the original sample.
(e)
It is a fairly safe bet that the mean number of text messages sent per day by females
is somewhere between 14.3 and 28.4.
(a)
Each point in the bootstrap distribution represents the difference between the female
and male means when the original sample has been re-sampled with replacement.
(b)
It is a fairly safe bet that the mean number of text messages sent per day by females
is somewhere between 12.7 lower and 14.5 higher than the mean number of text
messages sent per day by males.
(c)
Selection bias the sample is only from one company and is also only for the number of
calls made on Tuesdays.
Page 5
Chapter 6
4.
Challenge: Page 20
The corrected versions of the false statements are:
q
(a)
(3)
(b)
(5)
5. (a)
(i)
Parameter
(ii)
x , the mean exam mark for the sample of thirty STATS 108
Estimate
students = 31.97 marks.
There is no way of knowing whether a confidence interval actually contains the true
unknown value of the parameter. We simply take comfort in the fact that the method
works (i.e., produces confidence intervals that do contain the true value) most of the
time. For example, approximately 95% of 95% confidence intervals contain the true
value of the parameter.
Increasing the level of confidence increases the value of the t-multiplier and hence
increases the width of the confidence interval but has no effect on the value of the
estimate.
2. (c)
3. (d)
4. (b)
5. (a)
(vii) There are many ways of interpreting a confidence interval. Two different ways
follow.
6. (a)
(b)
Tutorial: Pages 22 to 25
Section A: Confidence intervals for a single mean or proportion
1.
2.
3.
t-multiplier = 2.045
= , the population mean exam mark for the STATS 108 exam.
(1)
With 95% confidence, we estimate that the population mean exam mark is
somewhere between 28.44 and 35.50 marks.
(2)
With 95% confidence, we estimate that the population mean exam mark is
31.97 with a margin of error of 3.53.
We dont know. The population mean mark is not known so we dont know whether
this particular 95% confidence interval contains the population mean. However, in the
long run, the population mean will be contained in 95% of the 95% confidence intervals
calculated from such samples.
A measure of the amount a sample estimate varies from sample to sample is called the
standard error of the estimate.
It roughly measures the average distance between an estimate and the population
parameter over all possible samples of a given size that can be taken from the population.
1.
(4)
2.
(a)
(b)
(c)
(d)
(a)
(b)
If we were to take a huge number of samples of 40 days and calculate their sample means
then we estimate that the average distance these sample means would be from the true
population mean would be roughly 2.12.
The formula for calculating a confidence interval is:
estimate t-multiplier standard error(estimate)
3.
For a specified level of confidence and number of degrees of freedom, a t-multiplier is the
number of standard errors between the estimate and each confidence limit.
For a given level of confidence, the t-multiplier decreases as the degrees of freedom
increase.
The t-multiplier multiplied by the standard error of the estimate is called the margin of error
and is half the width of a confidence interval.
(ii)
prisoners who were infected with TB and the proportion of Gypsy Spanish
prisoners who were infected with TB.
Estimate pW pG , the difference in the proportion of the sample of white
Spanish prisoners who were infected with TB and the proportion of the sample of
Gypsy Spanish prisoners who were infected with TB.
496 74
Page 6
(iv) se( pW pG )
Sampling situation (a)
se( pW pG ) 0.0438
(v)
(3)
Page 7
Chapter 7
3.
Challenge: Page 18
The corrected versions of the false statements are
h.
l.
q.
x.
= hypothesised value is not true, do we expect parameter > hypothesised value (1-sided),
or parameter < hypothesised value (1-sided), or dont we know which direction, in which
case we use parameter hypothesised value
(2-sided). We must not use the data to decide which relation (>, < or ) to use in the
alternative hypothesis.
A large P-value means that we have nothing against the null hypothesis so it could be true
. . . which doesnt mean that it is true! (See Question (p).)
Same as l above. Nonsignificant results (large P-values) do not mean that the null
hypothesis is true, they mean it could be true.
See Example 1, page 12, (birth month effect on height example). It is possible to have
established the existence of an effect (small P-value, statistical significance) but for that
effect to be so small as to be of no practical importance/significance. (See Question (w).)
P-value
Evidence against H0
> 0.10
0.10
none
weak
0.05
0.01
some
strong
0.001
very strong
4.
P-value < 5%
5.
Nothing.
6.
A confidence interval.
7.
A one-tailed test is used when the investigators have good grounds, before the study began,
for believing the departure from the null hypothesis goes in one particular direction.
Otherwise, or if in doubt, a two-tailed test is used. Good grounds mean that there is prior
information or there is a theory to tell the investigators which way the study is likely to go.
8.
The t-test statistic measures the number of standard errors the estimate is away from the
hypothesised value.
The more standard errors the estimate is away from the hypothesised value, the larger the
magnitude of the t-test statistic.
The larger the magnitude of the t-test statistic, the smaller the resulting P-value and hence
the stronger the evidence against the null hypothesis.
2. (c)
3. (e)
4. (b)
5 (d)
The smaller the magnitude of the t-test statistic, the larger the resulting P-value and hence
the weaker the evidence against the null hypothesis.
6 (e)
9.
Tutorial: Pages 20 to 22
When we deal with studies in which the data have been produced by:
random sampling of units from a population or populations we can make sample-topopulation inferences.
Section A: Quiz
1.
We test the null hypothesis and determine how much evidence we have against it.
The null hypothesis usually takes a sceptical point of view: the researchers hunch is
nonsense, there is nothing new or interesting happening, there is no effect.
In most situations the researcher hopes to disprove or reject H0.
The alternative hypothesis corresponds to the research hypothesis. It usually takes the
form that something is happening, there is a difference or an effect, there is a relationship.
In most situations the researcher hopes to give support to H1 by showing that H0 is not
believable.
2.
To measure the strength of evidence against the null hypothesis, we calculate a P-value.
The P-value is the conditional probability of observing a test statistic at least as extreme as
that observed, given that the null hypothesis is true.
We can estimate P-values either by a theory-based approach (e.g. t-tests) or a simulationbased approach (e.g. randomisation tests).
Page 8
who were infected with TB and the proportion of Gypsy Spanish prisoners who were
(b)
(c)
(d)
(e)
(c)
H1: 1 - 2 0
(d)
t0
(e)
(ii)
(f)
We have weak evidence that the proportion of white Spanish prisoners who were
infected with TB is different to the proportion of Gypsy Spanish prisoners. We estimate
that the proportion of White prisoners who had TB is somewhere between 1
percentage point lower than and 16 percentage points higher than the proportion of
Gypsy prisoners who had TB.
With 95% confidence, we estimate that the mean daily revenue of the first laundry
is somewhere between $1.96 less than and $69.41 more than the mean daily
revenue of the second laundry.
If the true difference in population means is somewhere in this interval, then it
could be as small as $1.96 (which is not of practical importance) or as big as
$69.41 (which is of practical importance).
The observed difference, 0.073, is not statistically significant at the 5% level (even
though we have weak evidence against H0, it is not strong enough for the test to be
statistically significant at the 5% level).
(i)
With 95% confidence, we estimate that the proportion of White prisoners who had TB
is somewhere between 1.3 percentage points lower than and 16 percentage points
higher than the proportion of Gypsy prisoners who had TB.
33.724 0
1.933 .
17.449
- that the mean daily revenue for laundry 1 is not the same as that for laundry 2.
0.0730 0
1.6667
0.0438
(h)
(2-sided hypothesis)
- that the proportion of white Spanish prisoners who were infected with TB is not the
same as the proportion of Gypsy Spanish prisoners who were infected with TB.
(g)
(i)
(iii) t0
(f)
The estimated difference is 1.933 standard errors away from the hypothesised
difference.
prisoners who were infected with TB and the proportion of the sample of Gypsy
Spanish prisoners who were infected with TB.
496 74
(b)
We have some evidence that there is a difference between the mean daily revenues
for the two laundries, with the mean daily income for laundry 1 being higher. We do
not have sufficient information to be able to determine whether the true difference in
mean daily incomes is big enough to be of any practical importance.
Even though it is plausible that the difference between the two laundries is so small as
to be of no practical importance, it is also plausible that the mean daily income of
laundry 1 is sufficiently greater than that of laundry 2 as to be of practical importance.
Therefore we should recommend laundry 1.
2.
(i)
(ii)
Page 9
Chapter 8
2.
similar as possible with respect to all other variables, such as parents education level. It
allows us to classify all other explanations for the observed difference, apart from
What might be an explanation (other than the breastfeeding) for the significant difference
in the mean GCI scores between the breastfed and the non-breastfed infants?
This observational study allows us to claim that breastfeeding (the factor of interest) and
GCI scores (the response) are related:
Breastfeeding
groups then we may conclude that the breastfeeding is the real cause of the difference.
Whether it would be possible or even ethical to randomly direct mothers as to how to feed
their infants is another issue altogether.
GCI score
3.
a.
The population to which the link between breastfeeding and GCI may apply should
not be systematically different from the sample of infants recruited in this study. We
would need to be able to reasonably assume that the 323 recruited infants were a
random sample from the described population. For example, the link between
breastfeeding and GCI may not hold true for a population which included infants of
other races or ethnicities.
b.
The infants in the study should be randomly selected from all New Zealand infants.
That is, the infants should be a random sample of all New Zealand infants.
Breastfeeding
GCI score
Breastfeeding results in an increase (is the cause of the increase /explains the increase) in
the mean GCI score.
OR
There could be another variable (a lurking or confounding variable) which has an effect
on both breastfeeding and GCI score and, as such, is the real cause of the difference
between the mean GCI scores.
For example, maybe breastfeeding is affected by parents education level (better educated
parents tend to breastfeed their infants) and maybe there is a causal relationship between
parents education level and GCI score (parents higher education levels result in a higher
GCI score for their children at age 4 years).
Breastfeeding
GCI score
Parents ed.
level
Then an alternative explanation for the increase in the mean GCI scores would be the
higher level of education of the parents.
In reality in this case, we would not be able to identify the real cause of the higher mean
GCI score for the breastfed infants; it could have been the breastfeeding or it could have
been the education level of the parents or it could have even been a combination of both
breastfeeding and the education level of the parents or even some other unidentified
lurking variable.
In an observational study the real cause of a significant difference is able to be identified
very rarely. The real cause could be the factor of interest or it could be a lurking or
confounding variable.
Lecture and Tutorial Answers: Chapter 8
Page 10
Challenge:
Part I, page 8
Part I page 9
1. (c)
g.
2. (d)
3. (b)
4. (e)
5 (b)
4. (e)
5. (a)
6. (a)
7. (b)
4. (d)
5 (d)
6. (d)
7. (e)
A two sample t-test can still work quite well (especially for large samples) even if there are
clear indications in the data that the Normality assumption is not true.
2. (a)
3. (b)
With paired data, each observation in one group is paired with an observation in the other
group and hence the two groups of data are NOT independent.
e.
With paired data, we analyse the differences. The paired data t-test is mechanically
equivalent to a one sample t-test on the differences. The necessary conditions for
conducting the test are checked by plotting the differences NOT the 2 groups of data.
f.
r.
The one sample t-test can still work quite well (especially for large samples) even if there
are clear indications in the data that the Normality assumption is not true.
u.
A large P-value provides no evidence against the hypothesised value but it does not mean
that the hypothesised value is true.
2. (b)
3. (e)
8. (c)
9. (e)
10. (b)
The null hypothesis in an F-test for one-way analysis of variance is that all of the underlying
means are equal.
d.
The alternative hypothesis in an F-test for one-way analysis of variance is that some of the
underlying means are different.
f.
The alternative hypothesis in an F-test for one-way analysis of variance is that at least two
of the underlying means are different.
j.
If the P-value for an F-test for one-way analysis of variance is large then the null hypothesis
is believable.
m.
If the P-value for an F-test for one-way analysis of variance is very small this suggests that
there are differences between at least two of the underlying means, but we would need to
look at pairwise confidence intervals to estimate the size of any differences.
p.
The assumption that the underlying distributions are Normally distributed is not critical for
the F-test for one-way analysis of variance. The F-test for one-way analysis of variance is
reasonably robust to departures from the Normality assumption.
r.
One of the assumptions for the F-test for one-way analysis of variance is that the underlying
population standard deviations are all equal.
Page 11
Tutorial: Pages 35 to 39
Page 34
1.
1.
(a)
2.
(a)
2.
When we are testing for equality of the underlying means of more than two groups.
3.
Paired data.
(b)
(c)
Paired data.
A t-test on the differences is more appropriate. A pair of observations is made on the same
subject so this is paired comparison data.
(b)
Since we have paired data we look at the dot plot of the differences. The dot plot shows
differences centred below 0 (current purchases higher than previous purchases) and slight
negative skewness.
H0 : Diff 0 vs H1 : Diff 0
4.
5.
A large P-value tells us that we have no evidence against the underlying means all being
equal, i.e., it is believable that the underlying means are equal.
A small P-value tells us that we have evidence against the underlying means all being equal,
i.e., we have evidence that differences exist between some (possibly all) of the underlying
means.
P-value = 0.033
A small P-value tells us nothing about which means differ from one another, and it also tells
us nothing about the size of any differences.
that viewers spend, on average, between $3.10 and $62.50 more when they have access
6.
7.
8.
9.
(c)
We have some evidence against there being no difference between the mean amounts of
current and previous spending. It appears that, on average, access to the cable network
is associated with an increase in spending by viewers. With 95% confidence, we estimate
to the cable network.
From the Tukey confidence intervals for differences between pairs of underlying means.
The observations within each sample are independent. (Critical)
The samples are independent. (Critical)
The underlying distributions are Normally distributed. (The test is reasonably robust against
departures from this assumption, especially when the sample sizes are similar and the total
sample size is large)
The standard deviations of the underlying distributions are equal. (The test is reasonably
robust against departures from this assumption, but the confidence intervals are not.)
The multiple comparisons problem
10. The F-test is reasonably robust against departures from this assumption so we can rely on
the P-value.
The confidence intervals are not robust against departures from this assumption so we
cannot rely on them.
(d)
The dot plot shows slight skewness, but the t-test is robust to such departures from
Normality. The results of the t-test should be valid in this situation.
(a)
(b)
Ensure observations within the sample are independent read the story.
By plotting the data. The choice of plot will depend on the sample sizes.
By plotting the data and/or looking at the sample standard deviations (We
require that the ratio of the largest sample standard deviation to the smallest
sample standard deviation is less than 2.).
Page 12
(c)
(d)
ANOVA table:
sW2 measures the variability within the samples (that is, the internal variability within the
DF
samples themselves).
(d)
f0
sB2
sW2
(e)
f0 : smaller / larger
f0 : smaller / larger
f0 : smaller / larger
f0 : smaller / larger
(a)
(b)
Error
72
24868
345
Total
74
28198
(i)
(ii)
(i)
Yes. The Neither level of treatment. We have strong evidence that the mean for
the neither group is higher than the mean for the drug group and we have some
to weak evidence that the mean for the neither group is higher than the mean for
the placebo group.
(ii)
(h)
(ii)
Though there are signs of moderate positive skewness in all three of the groups, with
the equal sample sizes and the moderate size of the three groups this should not cause
any concern with the validity of the F-test.
23.14
1.520 ).
15.22
H0: The underlying mean number of minutes to fall asleep are the same, i.e.,
Drug Neither Placebo , where Drug is the mean time for patients to fall asleep if
all 75 patients had been given the new drug, and similarly for Neither and Placebo .
H1: At least one of the three underlying mean number of minutes to fall asleep is
different from the other two.
P
0.011
(g)
The times for the Neither group are centred higher and more spread out than those
for the Drug and Placebo groups. There does not appear to be a great difference
between the centre and spread of times for the Drug and Placebo groups. There are
signs of moderate positive skewness in all three of the samples.
F
4.83
With 95% confidence we estimate that the underlying mean time for people taking the
placebo to fall asleep is somewhere between 8.9 minutes shorter and 16.2 minutes
longer than the underlying mean time for people taking the drug to fall asleep.
The assumption of equality of the standard deviations is reasonable as the ratio of the
largest sample standard deviation to the smallest standard deviation is less than 2
(c)
1665
(f)
3330
The P-value of 0.011 means that we have strong evidence against the null hypothesis.
We have strong evidence that at least one of the groups has a different underlying
mean number of minutes for people to fall asleep.
MS
(e)
SS
Treatment
Scenario 2
(i)
Pass categorical.
(ii)
One-way table of counts (frequency table) or bar graph comparing the counts or
proportion who pass and fail.
(ii)
Dot plot, box plot or histogram of the differences between Test and Assign (or vice
versa).
(ii)
Page 13
Chapter 9
Challenge: Page 15
1.
(a)
(b)
Yes.
f.
If, for several cells in a table of counts, there are relatively large differences between the
(c)
Yes.
observed counts and the expected counts under the null hypothesis, then the P-value for a
(d)
(e)
Several of the expected cell counts are very low, so there may be problems with the
assumptions for the Chi-square test.
(a)
(b)
No. The distribution of courses will reflect the chosen sample sizes, not the true
distribution.
(c)
Yes.
(d)
(e)
(a)
Yes. We could consider the samples of people under 30, people in the 30-49 age group
and the people in the 50 and over age group as three independent sub-samples and
carry out a Chi-square test that the distribution of primary news source is the same for
each age group.
(b)
(c)
(d)
Cell contribution =
2. (b)
3. (d)
4. (d)
5. (e)
6. (d)
7. (b)
2.
8. (e)
3.
(e)
225 250
= 56.25
1000
(100 51.625)2
= 45.330
51.625
(f)
The results are valid because all of the expected counts are greater than 5.
Page 14
Chapter 10
Challenge
Part I, page 21
Part I, page 19
No. Causation can only be assigned when the data come from a well designed and well
executed experiment.
If we want to use the y-values to make predictions about x-values then the regression line
would be different and its equation would have different values.
l.
Residuals and prediction errors are the different names for the same thing the
difference between the observed and predicted values.
r.
The Y-variable is called the response, outcome or dependent variable and the
X-variable is called the explanatory, predictor or independent variable.
y.
The sample correlation coefficient, r, and the slope of the least squares line are measuring
different things and so will not usually be the same.
z.
The sign (+ or -) of the correlation coefficient and the sign of the slope of the least squares
regression line are both indicating the direction of the association and hence will be the
same.
2.
3.
H0: 1 = 0
4.
A confidence interval for the mean estimates the mean value of Y at a given value of x.
A prediction interval estimates the value of Y at a given value of x.
5.
l.
s.
For a given value of x, the 95% prediction interval and the corresponding 95% confidence
interval for the mean have the same centres.
2.
A confidence interval for the mean only allows for uncertainty about the true values of 0
and 1.
2. (b)
3. (b)
4. (a)
5. (d)
6. (c)
7. (b)
5. (d)
6. (e)
7. (c)
8. (d)
9. (c)
10. (b)
2. (b)
3. (e)
4. (d)
Page 15
(a)
(b)
y 11.238 1.309 x
For each 3 year increase in smoking, we expect lung capacity to increase by 1.309 x
3 = 3.927.
(c)
(d)
(e)
(f)
There is a possible linear trend but the observations (28, 30) and (33, 35) are possible
outliers which cause concern with the appropriateness of the model.
H0: 1 = 0
H1: 1 0
P-value = 0.0086
There is strong evidence that an increase in years of smoking is associated with an
increase in lung capacity.
With 95% confidence, we estimate that for every additional year of smoking an
emphysema patients lung capacity increases by between 0.44 and 2.18 units.
(g)
With 95% confidence, we estimate that the mean lung capacity for people like those in
the study that spent 30 years smoking will be somewhere between 42.16 and 58.86.
With 95% confidence, we predict that the lung capacity for a person like those in the
study that spent 30 years smoking will be somewhere between 23.33 and 77.70.
2.
(5)
3.
(2)
4.
(3)
5.
(2)
Page 16