Lecture and Tutorial Answers

Chapter 1
Challenge: page 31
Section B: Proportional reasoning

974
0.2119
1.
(a) (i)
(ii)
4597
2399
0.5219
4597
96
0.3967
242
96
0.0437
2198
(b)
The corrected versions of the false statements are:

d.
945
0.2056
4597
In tables designed to convey information quickly and easily, include row or column
2.
averages if appropriate.
l.
If a plot of numeric data has a long right tail we say the data are positively skewed.
p.
An outlier should be removed from the data set only if it is found to be a mistake.
r.
For highly skewed data, the sample median is a more sensible measure of the centre than
the sample mean
u.
About 75% of the observations have a value less than the upper quartile.
(a)
Under 40
Alternative: About 25% of the observations have a value greater than the upper quartile.
v.
(c)
(iii)
bb. It is possible to explore more than two variables at a time, by using techniques such as
colour gradients and subsetting.
Total
Mild cases
16
20
36
Serious cases
15
35
50
Total
31
55
86
16
0.1860
86
(b)
(i)
(c)
15
0.3
50
Patterns we see in the data may not be facts.
40 or over
(ii)
(d)
20 35 15
0.8140
86
20
0.3636
55
(iii)
35
0.4070
86
3. (a)
Sample Exam / Cecil Test Questions: Pages 32 to 39

1.
(c)
2.
9.
(a)
3.
(a)
4.
(e)
5.
(c)
6.
(e)
7.
(a)
8.
(a)
(b)
10. (d)
11. (c)
12. (d)
13. (e)
14. (c)
15. (c)
16. (c)
17. (a)
18. (b)
19. (b)
20. (d)
21. (a)
22. (c)
23. (c)
24. (a)
25. (a)
26. (e)
Tutorial: Pages 40 to 43
(a)
Scatter plot
(b)
Side-by-side plot on the same scale. (Dot plot, box plot, histogram.)
(c)
Bar graphs of the three countries, showing proportions for each level of Inversions.
(d)
2.
Total
300
500
Not in default
2185
77% of 9500 = 7315
9500
Total
2385
7615
10000
Dont Drink Daily
Total
23.85%
(c)
200
0.0839
2385
(d)
300
0.0394
7615
(e)
200
300 0.0839
2.129
2385 7615 0.0394
4.
Drink Daily
Side-by-side plot on the same scale. (Dot plot, box plot, histogram.)
Answers will be released on Canvas after the tutorial.
(2)
Lecture and Tutorial Answers: Chapter 1
Low-risk
(b)
Section A: Exploring data

1.
High-risk
40% of 500 = 200
In default
Male drinker
19% of 5100 = 969
4131
5100
Female Drinker
10% of 4900 = 490
4410
4900
Total
1459
8541
10000
490
0.3358
1459
Page 1
Chapter 2
Tutorial: Page 9
Challenge: page 7
1.
(i)

d.
Study 2: We are comparing males and females.
Media reports which describe two variables as being linked or associated cannot be
Study 3: We are comparing the old style and new style of television commercials.
interpreted as a change in one variable will cause a change in the other variable.
h.
(ii)
If one of the treatment levels in an experiment involving people is a placebo, then blinding
should be used.
2. (e)
Study 1: To make the comparison, we are measuring the survival of the cats.
Study 2: To make the comparison, we are measuring the clothing expenditure for the
next 3 months.
Sample Exam / Cecil Test Questions: Page 8

1. (a)
Study 1: We are comparing cats that fell from 1 or 2 storeys, cats that fell from 3 5
storeys and cats that fell from 6 or more storeys.
Study 3: To make the comparison, we are measuring the recall scores for the
commercials.
3. (b)
(iii) Study 1: An observational study. There is no allocation (by the researcher) of subjects
(cats) to the number of storeys of the fall. Results are simply observed for cases that
happen.
Study 2: An observational study. There is no allocation (by the researcher) of subjects
(students) to the groups (male or female).
Study 3: An experiment. The researcher allocates which commercial is to be watched
by each subject (shopper).
(iv) It is not possible to do an experiment for study 1 due to ethical and moral
considerations. To do an experiment a sample of cats would have to be allocated a
height and then thrown out of a window at that height.
It is not possible to do an experiment for study 2 as the researcher cannot allocate a
gender to a student.
2.
(2)
False.
The corrected version of the false statement is:
Random allocation of treatments to subjects does not guarantee comparability of
treatment groups, even when we only have small numbers of subjects.
Page 2
Chapter 3
3.
(a)
(i)
Blinding was used in this study the technician doing the cleaning and assessing
the results was not aware of which version of oven cleaner was used. (The ovens
were also unaware of which cleaner was used.)
(ii)
The control group is the current version of the oven cleaner.
Challenge: page 13
i.
When the tail proportion is large then the observed difference is not unusual when chance
(iii) There was no blocking in this study. (No other factors apart from version of oven
is acting alone therefore chance COULD have been acting alone. All we can say is that we
cleaner were taken into account.)
have no evidence against chance was acting alone but this cannot be interpreted as
chance was acting alone.
(b)
The results of this study can be used to establish that any difference in mean
effectiveness score between the two versions of oven cleaner was caused by the
differences in the oven cleaners as there was random allocation of ovens to the two
treatment groups (current and new versions of oven cleaner).
(c)
Our tail probability of 0.17 means the observed difference is not unusual when chance
is acting alone, therefore chance COULD be acting alone.
(d)
We have no evidence that the new version of the oven cleaner gives higher
effectiveness scores, on average, than the current version.
(e)
We cannot conclude that the new version of the oven cleaner has the same
effectiveness scores, on average, as the current version - chance COULD be acting
alone OR something else as well as chance COULD also be acting (we dont have
enough information to determine which one of these two possibilities applies).
(a)
Blinding could be used in this study, if the male patients did not know the type of
surgery they had.
(b)
The Re-randomised data plot shows one of the 1000 re-randomisations under
chance alone. In this re-randomisation the diiference between the re-randomised
20
22
proportions, under chance alone, is
0.008 .
228 231
(c)
Our tail proportion of 0.069 means that, when chance is acting alone, a difference
between the group proportions of 0.045 or more is highly unlikely. In the actual study,
an observed difference between the two proportions of 0.045 or more would have been
highly unlikely if chance had been acting alone, therefore we are pretty sure that
chance was not acting alone in the actual study.
(d)
We can conclude that the type of surgery had an impact on whether the male patient
had a major event (death, heart attack or stroke) within 2 years of their surgery. We
are pretty sure that chance was not acting alone in the study and the study is a
well-designed experiment in which the male patients were randomly allocated to one
of the two types of surgery so a causal claim may be made.
Sample Exam / Cecil Test Questions Pages 13 and 14

1. (c)
2. (b)
3. (e)
4. (c)
1.
In a randomisation test we are assessing the plausibility of the explanation that an observed
difference between two groups is solely due to chance, i.e., is due to chance acting alone
We can say that the chance alone explanation is implausible if our observed difference is
unusual when chance is acting alone.
In a randomisation test, we randomly re-assign each unit to a group and, with only chance
acting, we calculate the difference between the two groups.
We repeat this re-randomisation a large number of times, and plot all of the differences to
get the re-randomisation distribution. We can then see whether our observed difference
is unlikely under chance alone.
(a)
The tail proportion is the proportion of times we get a difference at least as big as our
observed difference when chance is acting alone.
(b)
(c)
4.
When the tail proportion in the re-randomisation distribution is less than 5% then:
the observed difference would be unlikely when chance is acting alone,

therefore its a fairly safe bet chance isnt acting alone.
we have evidence against chance is acting alone
we have evidence that chance is not acting alone
When the tail proportion in the re-randomisation distribution is bigger than 5%

then:
the observed difference is not unusual when chance is acting alone, therefore
chance COULD be acting alone
we have NO evidence against chance is acting alone
chance COULD be acting alone OR something else as well as chance COULD

also be acting (we dont have enough information to determine which one of these
two possibilities applies).
Page 3
Chapter 4
Challenge: Page 7
d. Sampling errors are smaller in larger samples than in smaller samples.
j. This is a common misconception. The size of the sampling error is not dependent on the
size of the population.
n. Using random sampling in statistical surveys does not guarantee that each sample(s) will be
representative but it ensures that in the long run over repeated samples of data, the
samples will, on average, be representative.

1. (b)
2. (a)
3. (c)
4. (e)
Tutorial: Page 9
1.
(4)
Selection bias and self-selection bias.
2.
(3)
The Hobbit, Wild Swans, The Power of One, April Fools Day.
3.
(3)
False.
It is unlikely that sophisticated sampling projections can correct the results if the
population you are sampling from is different to the one of interest.
4.
(5)
Page 4
Chapter 5
6.
(a)
The parameter we are interested in is the proportion of New Zealand adults who, in
2015, had trust and confidence in Members of Parliament.
(b)
0.25
(c)
The blue vertical lines in the re-sample plot represent 1000 re-sample proportions
Challenge: Page 13
The corrected version of the false statements is:
d
A bootstrap re-sample is obtained by random sampling, with replacement, from the
taken with replacement from the original sample. They show the extent of the variation
population.
in the proportions in these 1000 re-samples.

(d)

1. (b)
2. (d)
3. (d)
4. (e)
7.
(a)
5. (c)
(b)
Yes. 10 percentage points is inside the bootstrap confidence interval.
(c)
It is a fairly safe bet that the proportion of New Zealand adults who, in 2015, had trust
and confidence in Members of Parliament is somewhere between 2 and 12 percentage
points higher than the corresponding proportion in 2013.
(d)
We dont know but it is a fairly safe bet that it is in this interval.
1.
A parameter is a numerical characteristic of a population or distribution.

An estimate is a known quantity calculated from the (sample) data to estimate an unknown
parameter.
It is a fairly safe bet that the proportion of New Zealand adults who, in 2015, had trust
and confidence in Members of Parliament is somewhere between 0.21 and 0.28.
The lower and upper limits of the bootstrap confidence interval were obtained by not
including the bottom and top 2.5% re-sample proportions.
The process of using sample data to try and make useful statements about an unknown
parameter is called sample-to-population inference.
2.
We form intervals of believable values rather than just stating individual estimates to give
an indication of the level of uncertainty in the estimate.
3.
(a)
The parameter we are interested in is Female , the mean number of text messages sent
per day by females for the population.
4.
5.
(b)
We do not know the value of the parameter.
(c)
Our estimate of the parameter is the sample mean: xFemale = 20.92.
(d)
The re-samples are randomly selected from the original sample with replacement using
the same sample size as the original sample.
(e)
It is a fairly safe bet that the mean number of text messages sent per day by females
is somewhere between 14.3 and 28.4.
(a)
Each point in the bootstrap distribution represents the difference between the female
and male means when the original sample has been re-sampled with replacement.
(b)
It is a fairly safe bet that the mean number of text messages sent per day by females
is somewhere between 12.7 lower and 14.5 higher than the mean number of text
messages sent per day by males.
(c)
As 0 is in the bootstrap confidence interval, it is believable that, on average, males and

females send the same number of text messages per day.
Selection bias the sample is only from one company and is also only for the number of
calls made on Tuesdays.
Page 5
Chapter 6
4.
Challenge: Page 20
q
(a)
(3)
(b)
(5)
5. (a)
(i)
Parameter
(ii)
x , the mean exam mark for the sample of thirty STATS 108
Estimate
students = 31.97 marks.
There is no way of knowing whether a confidence interval actually contains the true
unknown value of the parameter. We simply take comfort in the fact that the method
works (i.e., produces confidence intervals that do contain the true value) most of the
time. For example, approximately 95% of 95% confidence intervals contain the true
value of the parameter.
Increasing the level of confidence increases the value of the t-multiplier and hence
increases the width of the confidence interval but has no effect on the value of the
estimate.
(iii) Formula = estimate t se(estimate) gives x t se( x )

(iv) se( x ) = 1.7251
(v)
2. (c)
3. (d)
4. (b)
5. (a)
x t se( x ) = 31.97 2.045 1.7251 = 31.97 3.5278 = (28.44, 35.50)
(vii) There are many ways of interpreting a confidence interval. Two different ways
follow.
6. (a)
(b)
Section A: Confidence intervals for a single mean or proportion
1.
2.
3.
t-multiplier = 2.045
(vi) 95% confidence interval for :

1. (b)
= , the population mean exam mark for the STATS 108 exam.
(1)
With 95% confidence, we estimate that the population mean exam mark is
somewhere between 28.44 and 35.50 marks.
(2)
With 95% confidence, we estimate that the population mean exam mark is
31.97 with a margin of error of 3.53.
We dont know. The population mean mark is not known so we dont know whether
this particular 95% confidence interval contains the population mean. However, in the
long run, the population mean will be contained in 95% of the 95% confidence intervals
calculated from such samples.
A measure of the amount a sample estimate varies from sample to sample is called the
standard error of the estimate.
Section B: Confidence interval for a difference in means or proportions
It roughly measures the average distance between an estimate and the population
parameter over all possible samples of a given size that can be taken from the population.
1.
(4)
2.
(a)
Situation (b): Single sample, several response categories
(b)
Situation (a): Two independent samples
(c)
Situation (c): Single sample, two or more Yes/No items
(d)
Situation (a): Two independent samples
(a)
(b)
Situation (c): Single sample, two or more Yes/No items .

(i) Parameter = pW pG, the true difference in the proportion of white Spanish
If we were to take a huge number of samples of 40 days and calculate their sample means
then we estimate that the average distance these sample means would be from the true
population mean would be roughly 2.12.
The formula for calculating a confidence interval is:
estimate t-multiplier standard error(estimate)
3.
For a specified level of confidence and number of degrees of freedom, a t-multiplier is the
number of standard errors between the estimate and each confidence limit.
For a given level of confidence, the t-multiplier decreases as the degrees of freedom
increase.
The t-multiplier multiplied by the standard error of the estimate is called the margin of error
and is half the width of a confidence interval.
(ii)
prisoners who were infected with TB and the proportion of Gypsy Spanish
prisoners who were infected with TB.
Estimate pW pG , the difference in the proportion of the sample of white
Spanish prisoners who were infected with TB and the proportion of the sample of
Gypsy Spanish prisoners who were infected with TB.
496 74
0.5598 0.4868 0.0730

886 152
(iii) Formula = estimate t se(estimate) gives ( pW pG ) t se( pW pG )
Page 6
(iv) se( pW pG )
Sampling situation (a)
se( pW pG ) 0.0438
(v)
For a 95% confidence interval with df = use t=1.96
(vi) 95% c.i. is: ( pW pG ) t se( pW pG )

= 0.0730 1.96 0.043837
= 0.0730 0.08592
= (0.0129, 0.1589)
(vii) With 95% confidence, we estimate the proportion of white prisoners who were
infected with TB to be somewhere between 1.3 percentage points lower than and
16 percentage points higher than the proportion of Gypsy prisoners who were
infected with TB.
5.
(3)
Page 7
Chapter 7
3.
Challenge: Page 18
The corrected versions of the false statements are
h.
In a t-test, the sidedness of the alternative hypothesis (one-sided or two-sided) is

determined by what we expect to be true if the null hypothesis is not true, i.e., if parameter
l.
q.
x.
= hypothesised value is not true, do we expect parameter > hypothesised value (1-sided),
or parameter < hypothesised value (1-sided), or dont we know which direction, in which
case we use parameter hypothesised value
(2-sided). We must not use the data to decide which relation (>, < or ) to use in the
alternative hypothesis.
A large P-value means that we have nothing against the null hypothesis so it could be true
. . . which doesnt mean that it is true! (See Question (p).)
Same as l above. Nonsignificant results (large P-values) do not mean that the null
hypothesis is true, they mean it could be true.
See Example 1, page 12, (birth month effect on height example). It is possible to have
established the existence of an effect (small P-value, statistical significance) but for that
effect to be so small as to be of no practical importance/significance. (See Question (w).)
P-value
Evidence against H0
> 0.10
0.10
none
weak
0.05
0.01
some
strong
0.001
very strong
4.
P-value < 5%
5.
Nothing.
6.
A confidence interval.
7.
A one-tailed test is used when the investigators have good grounds, before the study began,
for believing the departure from the null hypothesis goes in one particular direction.
Otherwise, or if in doubt, a two-tailed test is used. Good grounds mean that there is prior
information or there is a theory to tell the investigators which way the study is likely to go.
8.
The t-test statistic measures the number of standard errors the estimate is away from the
hypothesised value.
The more standard errors the estimate is away from the hypothesised value, the larger the
magnitude of the t-test statistic.
The larger the magnitude of the t-test statistic, the smaller the resulting P-value and hence
the stronger the evidence against the null hypothesis.

1. (b)
2. (c)
3. (e)
4. (b)
5 (d)
The smaller the magnitude of the t-test statistic, the larger the resulting P-value and hence
the weaker the evidence against the null hypothesis.
6 (e)
9.
When we deal with studies in which the data have been produced by:
random assignment of units to treatment groups (an experiment) we can make

experiment-to-causation inferences.
random sampling of units from a population or populations we can make sample-topopulation inferences.
Section A: Quiz
1.
We test the null hypothesis and determine how much evidence we have against it.
The null hypothesis usually takes a sceptical point of view: the researchers hunch is
nonsense, there is nothing new or interesting happening, there is no effect.
In most situations the researcher hopes to disprove or reject H0.
The alternative hypothesis corresponds to the research hypothesis. It usually takes the
form that something is happening, there is a difference or an effect, there is a relationship.
In most situations the researcher hopes to give support to H1 by showing that H0 is not
believable.
2.
To measure the strength of evidence against the null hypothesis, we calculate a P-value.
The P-value is the conditional probability of observing a test statistic at least as extreme as
that observed, given that the null hypothesis is true.
We can estimate P-values either by a theory-based approach (e.g. t-tests) or a simulationbased approach (e.g. randomisation tests).
Page 8
Section B: Doing Tests by Hand

1.
(a) Parameter = pW pG, the true difference in the proportion of white Spanish prisoners
Section C: Interpreting Output and Interpretation Issues

1.
(a) Parameter = 1 2, the difference between the mean daily revenue for laundry 1 and
who were infected with TB and the proportion of Gypsy Spanish prisoners who were
(b)
infected with TB.

H0: pW pG = 0
(c)
H1: pW pG 0 (2-sided hypothesis)
(d)
Estimate pW pG , the difference in the proportion of the sample of white Spanish
(e)
(c)
H1: 1 - 2 0
(d)
t0
(e)
(ii)
(f)
Use estimate t se(estimate)

estimate = 0.0730, se(estimate) = 0.043837, t = 1.96
95% confidence interval is: 0.0730 1.96 0.0438 = (0.0128, 0.1588)
We have weak evidence that the proportion of white Spanish prisoners who were
infected with TB is different to the proportion of Gypsy Spanish prisoners. We estimate
that the proportion of White prisoners who had TB is somewhere between 1
percentage point lower than and 16 percentage points higher than the proportion of
Gypsy prisoners who had TB.
With 95% confidence, we estimate that the mean daily revenue of the first laundry
is somewhere between $1.96 less than and $69.41 more than the mean daily
revenue of the second laundry.
If the true difference in population means is somewhere in this interval, then it
could be as small as $1.96 (which is not of practical importance) or as big as
$69.41 (which is of practical importance).
The observed difference, 0.073, is not statistically significant at the 5% level (even
though we have weak evidence against H0, it is not strong enough for the test to be
statistically significant at the 5% level).
(i)
We have some evidence:

- against H0 in favour of H1.
The observed difference, $33.70, is not a statistically significant result at the 5%

level (even though we have evidence against H0, it is not strong enough for the
test to be statistically significant at the 5% level).
We have weak evidence:

- against H0 in favour of H1.
With 95% confidence, we estimate that the proportion of White prisoners who had TB
is somewhere between 1.3 percentage points lower than and 16 percentage points
higher than the proportion of Gypsy prisoners who had TB.
33.724 0
1.933 .
17.449
- that the mean daily revenue for laundry 1 is not the same as that for laundry 2.
0.0730 0
1.6667
0.0438
(h)
(2-sided hypothesis)
- that a laundry effect exists for daily revenue.
- that the proportion of white Spanish prisoners who were infected with TB is not the
same as the proportion of Gypsy Spanish prisoners who were infected with TB.
(g)
(i)
Sampling situation (a)
(iii) t0
(f)
that for laundry 2.

H0: 1 - 2 = 0
The estimated difference is 1.933 standard errors away from the hypothesised
difference.
prisoners who were infected with TB and the proportion of the sample of Gypsy
Spanish prisoners who were infected with TB.
496 74
0.5598 0.4868 0.0730

886 152
estimate hypothesised value
(i)
t0
std error
(ii)
(b)
We have some evidence that there is a difference between the mean daily revenues
for the two laundries, with the mean daily income for laundry 1 being higher. We do
not have sufficient information to be able to determine whether the true difference in
mean daily incomes is big enough to be of any practical importance.
Even though it is plausible that the difference between the two laundries is so small as
to be of no practical importance, it is also plausible that the mean daily income of
laundry 1 is sufficiently greater than that of laundry 2 as to be of practical importance.
Therefore we should recommend laundry 1.
2.
(i)
P-value < 0.05
(ii)
P-value > 0.05
Page 9
Chapter 8
2.
Exercises for discussion

Page 4
1.
similar as possible with respect to all other variables, such as parents education level. It
allows us to classify all other explanations for the observed difference, apart from
What might be an explanation (other than the breastfeeding) for the significant difference
in the mean GCI scores between the breastfed and the non-breastfed infants?
breastfeeding, as chance explanations. If we find a significant difference between the two
This observational study allows us to claim that breastfeeding (the factor of interest) and
GCI scores (the response) are related:
Breastfeeding
The study would need to be a randomised experiment. We would randomly determine

which mothers would breastfeed and which mothers would not. The random assignment of
mothers to breastfeed and not to breastfeed is an attempt to have the two infant groups as
groups then we may conclude that the breastfeeding is the real cause of the difference.
Whether it would be possible or even ethical to randomly direct mothers as to how to feed
their infants is another issue altogether.
GCI score
3.
a.
The population to which the link between breastfeeding and GCI may apply should
not be systematically different from the sample of infants recruited in this study. We
would need to be able to reasonably assume that the 323 recruited infants were a
random sample from the described population. For example, the link between
breastfeeding and GCI may not hold true for a population which included infants of
other races or ethnicities.
b.
The infants in the study should be randomly selected from all New Zealand infants.
That is, the infants should be a random sample of all New Zealand infants.
Then, this relationship could be a causal relationship:
Breastfeeding
GCI score
Breastfeeding results in an increase (is the cause of the increase /explains the increase) in
the mean GCI score.
OR
There could be another variable (a lurking or confounding variable) which has an effect
on both breastfeeding and GCI score and, as such, is the real cause of the difference
between the mean GCI scores.
For example, maybe breastfeeding is affected by parents education level (better educated
parents tend to breastfeed their infants) and maybe there is a causal relationship between
parents education level and GCI score (parents higher education levels result in a higher
GCI score for their children at age 4 years).
Breastfeeding
GCI score
Parents ed.
level
Then an alternative explanation for the increase in the mean GCI scores would be the
higher level of education of the parents.
In reality in this case, we would not be able to identify the real cause of the higher mean
GCI score for the breastfed infants; it could have been the breastfeeding or it could have
been the education level of the parents or it could have even been a combination of both
breastfeeding and the education level of the parents or even some other unidentified
lurking variable.
In an observational study the real cause of a significant difference is able to be identified
very rarely. The real cause could be the factor of interest or it could be a lurking or
confounding variable.
Page 10
Challenge:
Sample Exam / Cecil Test Questions
Part I, page 8
Part I page 9
The corrected version of the false statements is:
1. (c)
g.
2. (d)
3. (b)
4. (e)
5 (b)
4. (e)
5. (a)
6. (a)
7. (b)
4. (d)
5 (d)
6. (d)
7. (e)
A two sample t-test can still work quite well (especially for large samples) even if there are
clear indications in the data that the Normality assumption is not true.
Part II, page 17
Part II, page 18 and 19

1. (e)
2. (a)
3. (b)
Corrected versions of the false statements are:

b.
With paired data, each observation in one group is paired with an observation in the other
group and hence the two groups of data are NOT independent.
e.
With paired data, we analyse the differences. The paired data t-test is mechanically
equivalent to a one sample t-test on the differences. The necessary conditions for
conducting the test are checked by plotting the differences NOT the 2 groups of data.
f.
Same reason as e. Plot the differences NOT the 2 groups of data.
r.
The one sample t-test can still work quite well (especially for large samples) even if there
are clear indications in the data that the Normality assumption is not true.
u.
A large P-value provides no evidence against the hypothesised value but it does not mean
that the hypothesised value is true.
Part III, pages 31 to 33

1. (d)
2. (b)
3. (e)
8. (c)
9. (e)
10. (b)
Part III, page 30

b.
The null hypothesis in an F-test for one-way analysis of variance is that all of the underlying
means are equal.
d.
The alternative hypothesis in an F-test for one-way analysis of variance is that some of the
underlying means are different.
f.
The alternative hypothesis in an F-test for one-way analysis of variance is that at least two
of the underlying means are different.
j.
If the P-value for an F-test for one-way analysis of variance is large then the null hypothesis
is believable.
m.
If the P-value for an F-test for one-way analysis of variance is very small this suggests that
there are differences between at least two of the underlying means, but we would need to
look at pairwise confidence intervals to estimate the size of any differences.
p.
The assumption that the underlying distributions are Normally distributed is not critical for
the F-test for one-way analysis of variance. The F-test for one-way analysis of variance is
reasonably robust to departures from the Normality assumption.
r.
One of the assumptions for the F-test for one-way analysis of variance is that the underlying
population standard deviations are all equal.
Page 11
Short Response Questions
Page 34
Section A: Two Independent Samples or Paired Comparisons
1.
1.
(a)
2.
(a)
It is a method which uses a comparison of a measure of spread between group means

with a measure of overall spread within groups to determine whether the data provide
evidence that the underlying group means are not all the same.
2.
When we are testing for equality of the underlying means of more than two groups.
3.
H0: The underlying means are all identical.
Paired data.
(b)
Two independent samples.
(c)
Paired data.
A t-test on the differences is more appropriate. A pair of observations is made on the same
subject so this is paired comparison data.
(b)
Since we have paired data we look at the dot plot of the differences. The dot plot shows
differences centred below 0 (current purchases higher than previous purchases) and slight
H1: Differences exist between some of the underlying means.
negative skewness.
H0 : Diff 0 vs H1 : Diff 0
4.
Large values of f0 provide evidence against H0.
5.
A large P-value tells us that we have no evidence against the underlying means all being
equal, i.e., it is believable that the underlying means are equal.
A small P-value tells us that we have evidence against the underlying means all being equal,
i.e., we have evidence that differences exist between some (possibly all) of the underlying
means.
P-value = 0.033
A small P-value tells us nothing about which means differ from one another, and it also tells
us nothing about the size of any differences.
that viewers spend, on average, between $3.10 and $62.50 more when they have access
6.
7.
8.
9.
(c)
We have some evidence against there being no difference between the mean amounts of
current and previous spending. It appears that, on average, access to the cable network
is associated with an increase in spending by viewers. With 95% confidence, we estimate
to the cable network.
From the Tukey confidence intervals for differences between pairs of underlying means.
The observations within each sample are independent. (Critical)
The samples are independent. (Critical)
The underlying distributions are Normally distributed. (The test is reasonably robust against
departures from this assumption, especially when the sample sizes are similar and the total
sample size is large)
The standard deviations of the underlying distributions are equal. (The test is reasonably
robust against departures from this assumption, but the confidence intervals are not.)
The multiple comparisons problem
10. The F-test is reasonably robust against departures from this assumption so we can rely on
the P-value.
The confidence intervals are not robust against departures from this assumption so we
cannot rely on them.
(d)
The dot plot shows slight skewness, but the t-test is robust to such departures from
Normality. The results of the t-test should be valid in this situation.
Section B: More Than Two Independent Samples

1.
(a)
H0: 1 = 2 = 3 = 4 (The underlying/population means are all equal.)

H1: At least one of the underlying/ population means is different from the other three.
(b)
Assumption: The samples are random.

Check:
Ensure observations within the sample are independent read the story.
Assumption: The samples are independent of each other.

Check:
Ensure independence in the design of the experiment or study read the

story.
Assumption: The underlying distribution of each group is Normally distributed.

Check:
By plotting the data. The choice of plot will depend on the sample sizes.
Assumption: The population standard deviations of each group are equal.

Check:
By plotting the data and/or looking at the sample standard deviations (We
require that the ratio of the largest sample standard deviation to the smallest
sample standard deviation is less than 2.).
Page 12
(c)
sB2 measures the variability between the sample means.
(d)
ANOVA table:
sW2 measures the variability within the samples (that is, the internal variability within the
DF
samples themselves).
(d)
f0
sB2
sW2
sB2 : smaller / same / larger
(e)
f0 : smaller / larger
sW2 : smaller / same / larger
P-value : smaller / larger
more / less evidence against H0
(a)
(b)
Error
72
24868
345
Total
74
28198
(i)
Drug and Neither.
(ii)
Drug and Neither. Placebo and Neither.
(i)
Yes. The Neither level of treatment. We have strong evidence that the mean for
the neither group is higher than the mean for the drug group and we have some
to weak evidence that the mean for the neither group is higher than the mean for
the placebo group.
(ii)
No. We have no evidence of a difference between the underlying means of the

Drug and Placebo groups.
(h)
Section C: Identifying Appropriate Type of Analysis

Scenario 1
(i)
Exam numeric, Attend categorical.
There are three independent random samples.
(ii)
Side-by-side dot plot or box plot on the same scale.
Though there are signs of moderate positive skewness in all three of the groups, with
the equal sample sizes and the moderate size of the three groups this should not cause
any concern with the validity of the F-test.
(iii) D: Two-sample t-test on a difference between two means
23.14
1.520 ).
15.22
H0: The underlying mean number of minutes to fall asleep are the same, i.e.,
Drug Neither Placebo , where Drug is the mean time for patients to fall asleep if
all 75 patients had been given the new drug, and similarly for Neither and Placebo .
H1: At least one of the three underlying mean number of minutes to fall asleep is
different from the other two.
P
0.011
(g)
The times for the Neither group are centred higher and more spread out than those
for the Drug and Placebo groups. There does not appear to be a great difference
between the centre and spread of times for the Drug and Placebo groups. There are
signs of moderate positive skewness in all three of the samples.
F
4.83
With 95% confidence we estimate that the underlying mean time for people taking the
placebo to fall asleep is somewhere between 8.9 minutes shorter and 16.2 minutes
longer than the underlying mean time for people taking the drug to fall asleep.
The assumption of equality of the standard deviations is reasonable as the ratio of the
largest sample standard deviation to the smallest standard deviation is less than 2
(c)
1665
(f)

2.
3330
The P-value of 0.011 means that we have strong evidence against the null hypothesis.
We have strong evidence that at least one of the groups has a different underlying
mean number of minutes for people to fall asleep.
MS
(e)

SS
Treatment
Scenario 2
(i)
Pass categorical.
(ii)
One-way table of counts (frequency table) or bar graph comparing the counts or
proportion who pass and fail.
(iii) B: One-sample t-test on a proportion

Scenario 3
(i)
Assign numeric, Test numeric.
(ii)
Dot plot, box plot or histogram of the differences between Test and Assign (or vice
versa).
(iii) C: One-sample t-test on a mean of differences / Paired-data t-test

Scenario 4
(i)
Exam numeric, Degree categorical.
(ii)
Side-by-side dot plot or box plot on the same scale.
(iii) F: F-test for one-way analysis of variance

Page 13
Chapter 9
Tutorial: Page 19 and 20
Challenge: Page 15
1.
(a)
One sample, cross-classified by two factors.
The corrected version of the false statement is
(b)
Yes.
f.
If, for several cells in a table of counts, there are relatively large differences between the
(c)
Yes.
observed counts and the expected counts under the null hypothesis, then the P-value for a
(d)
All three hypotheses could be tested.
Chi-square test will be small.
(e)
Several of the expected cell counts are very low, so there may be problems with the
assumptions for the Chi-square test.
(a)
Independence is satisfied as subjects were randomly to groups.
(b)
No. The distribution of courses will reflect the chosen sample sizes, not the true
distribution.
(c)
Yes.
(d)
Hypothesis (ii) and (iii) could be tested.
(e)
Hypothesis (i) and (iii) could be tested.
(a)
Yes. We could consider the samples of people under 30, people in the 30-49 age group
and the people in the 50 and over age group as three independent sub-samples and
carry out a Chi-square test that the distribution of primary news source is the same for
each age group.
(b)
Degrees of freedom = (3 1)(3 1) = 2 2 = 4
(c)
Expected count for the (Under 30, Radio) cell =
(d)
Cell contribution =

1. (e)
2. (b)
3. (d)
4. (d)
5. (e)
6. (d)
7. (b)
2.
8. (e)
3.
(e)
225 250
= 56.25
1000
(100 51.625)2
= 45.330
51.625
The P-value = 0.000 to 3 decimal places.

We have very strong evidence to suggest that there is a relationship between a
persons age and their primary news source.
(f)
The results are valid because all of the expected counts are greater than 5.
Page 14
Chapter 10
Short Response Questions
Challenge
Part I, page 21
Part I, page 19
No. Causation can only be assigned when the data come from a well designed and well
executed experiment.

e.
If we want to use the y-values to make predictions about x-values then the regression line
would be different and its equation would have different values.
l.
Residuals and prediction errors are the different names for the same thing the
difference between the observed and predicted values.
r.
The Y-variable is called the response, outcome or dependent variable and the
X-variable is called the explanatory, predictor or independent variable.
y.
The sample correlation coefficient, r, and the slope of the least squares line are measuring
different things and so will not usually be the same.
z.
The sign (+ or -) of the correlation coefficient and the sign of the slope of the least squares
regression line are both indicating the direction of the association and hence will be the
same.
Part II, page 25

1.
A linear trend and constant scatter about that trend.
2.
There is a linear relationship between x and the mean value of Y at X = x.

The random errors are Normally distributed with mean zero and all have the same standard
deviation regardless of the value of x.
The random errors are independent.
3.
H0: 1 = 0
4.
A confidence interval for the mean estimates the mean value of Y at a given value of x.
A prediction interval estimates the value of Y at a given value of x.
Part II, page 22
5.

k.
For an observation ( xi , y i ) , the residual is calculated by ui y i y i , where y i 0 1x i .
l.
When testing for no linear relationship between X and Y, we test

H0: 1 = 0.
s.
For a given value of x, the 95% prediction interval and the corresponding 95% confidence
interval for the mean have the same centres.
The two sources of uncertainty are:

1.
uncertainty about the true values of 0 and 1, and
2.
uncertainty due to random scatter about the true line.
A confidence interval for the mean only allows for uncertainty about the true values of 0
and 1.
Sample Exam / Cecil Test Questions

Part I, pages 20 and 21
1. (b)
2. (b)
3. (b)
4. (a)
5. (d)
6. (c)
7. (b)
5. (d)
6. (e)
7. (c)
8. (d)
9. (c)
10. (b)
Part II, pages 23 and 24

1. (c)
2. (b)
3. (e)
4. (d)
Page 15
Tutorial: Page 26 and 27

1.
(a)
(b)
y 11.238 1.309 x
For each 3 year increase in smoking, we expect lung capacity to increase by 1.309 x
3 = 3.927.
(c)
(d)
Predicted lung capacity = 11.238 + 1.309 x 30 = 50.5

Predicted lung capacity = 11.238 + 1.309 x 25 = 44.0
Residual = Observed value predicted value = 55 44.0 = 11
(e)
Years smoking is used to predict lung capacity.

Years smoking is a numeric variable and Lung capacity is continuous and random.
(f)
There is a possible linear trend but the observations (28, 30) and (33, 35) are possible
outliers which cause concern with the appropriateness of the model.
H0: 1 = 0
H1: 1 0
P-value = 0.0086
There is strong evidence that an increase in years of smoking is associated with an
increase in lung capacity.
With 95% confidence, we estimate that for every additional year of smoking an
emphysema patients lung capacity increases by between 0.44 and 2.18 units.
(g)
With 95% confidence, we estimate that the mean lung capacity for people like those in
the study that spent 30 years smoking will be somewhere between 42.16 and 58.86.
With 95% confidence, we predict that the lung capacity for a person like those in the
study that spent 30 years smoking will be somewhere between 23.33 and 77.70.
2.
(5)
3.
(2)
4.
(3)
5.
(2)
Page 16

Lecture and Tutorial Answers

Uploaded by

Lecture and Tutorial Answers

Uploaded by

Chapter 1

Section B: Proportional reasoning

The corrected versions of the false statements are:

Patterns we see in the data may not be facts.

Sample Exam / Cecil Test Questions: Pages 32 to 39

77% of 9500 = 7315

Dont Drink Daily

Answers will be released on Canvas after the tutorial.

Lecture and Tutorial Answers: Chapter 1

Section A: Exploring data

19% of 5100 = 969

10% of 4900 = 490

The corrected versions of the false statements are:

Study 2: We are comparing males and females.

Sample Exam / Cecil Test Questions: Page 8

Lecture and Tutorial Answers: Chapter 2

The control group is the current version of the oven cleaner.

cleaner were taken into account.)

Sample Exam / Cecil Test Questions Pages 13 and 14

the observed difference would be unlikely when chance is acting alone,

we have evidence against chance is acting alone

we have evidence that chance is not acting alone

When the tail proportion in the re-randomisation distribution is bigger than 5%

we have NO evidence against chance is acting alone

chance COULD be acting alone OR something else as well as chance COULD

Lecture and Tutorial Answers: Chapter 3

Sample Exam / Cecil Test Questions: Page 8

Selection bias and self-selection bias.

Lecture and Tutorial Answers: Chapter 4

A bootstrap re-sample is obtained by random sampling, with replacement, from the

in the proportions in these 1000 re-samples.

Sample Exam / Cecil Test Questions: Pages 13 to 15

Yes. 10 percentage points is inside the bootstrap confidence interval.

We dont know but it is a fairly safe bet that it is in this interval.

A parameter is a numerical characteristic of a population or distribution.

We do not know the value of the parameter.

Our estimate of the parameter is the sample mean: xFemale = 20.92.

As 0 is in the bootstrap confidence interval, it is believable that, on average, males and

Lecture and Tutorial Answers: Chapter 5

(iii) Formula = estimate t se(estimate) gives x t se( x )

x t se( x ) = 31.97 2.045 1.7251 = 31.97 3.5278 = (28.44, 35.50)

(vi) 95% confidence interval for :

Sample Exam / Cecil Test Questions: Page 21

Section B: Confidence interval for a difference in means or proportions

Situation (b): Single sample, several response categories

Situation (a): Two independent samples

Situation (c): Single sample, two or more Yes/No items

Situation (a): Two independent samples

Situation (c): Single sample, two or more Yes/No items .

Lecture and Tutorial Answers: Chapter 6

0.5598 0.4868 0.0730

For a 95% confidence interval with df = use t=1.96

(vi) 95% c.i. is: ( pW pG ) t se( pW pG )

Lecture and Tutorial Answers: Chapter 6

In a t-test, the sidedness of the alternative hypothesis (one-sided or two-sided) is

Sample Exam / Cecil Test Questions: Page 19

random assignment of units to treatment groups (an experiment) we can make

Lecture and Tutorial Answers: Chapter 7

Section B: Doing Tests by Hand

Section C: Interpreting Output and Interpretation Issues

infected with TB.

H1: pW pG 0 (2-sided hypothesis)

Estimate pW pG , the difference in the proportion of the sample of white Spanish

Use estimate t se(estimate)

Lecture and Tutorial Answers: Chapter 7

We have some evidence:

The observed difference, $33.70, is not a statistically significant result at the 5%

We have weak evidence:

- that a laundry effect exists for daily revenue.

Sampling situation (a)

that for laundry 2.

0.5598 0.4868 0.0730

P-value < 0.05

P-value > 0.05

Exercises for discussion

breastfeeding, as chance explanations. If we find a significant difference between the two