QUANTITATIVE METHODS
WEEK 7
Dr. Zahra sadeghinejad
ONE-WAY ANOVA
The one-way analysis of variance is used to test the claim that three
or more population means are equal.
Conditions or Assumptions
The data are randomly sampled
The variances of each sample are assumed equal
The residuals are normally distributed
The null hypothesis is that the means are all equal
The alternative hypothesis is that at least one of the means is
different
EXAMPLE 1
A clinical trial is run to compare weight loss programs and participants are
randomly assigned to one of the comparison programs and are
counselled on the details of the assigned program. Participants follow the
assigned program for 8 weeks.
The outcome of interest is weight loss, defined as the difference in weight
measured at the start of the study (baseline) and weight measured at the
end of the study (8 weeks), measured in pounds.
Three popular weight loss programs are considered; low calorie diet; low
fat diet and the third is a low carbohydrate diet.
For comparison purposes, a fourth group is considered as a control
group. Participants in the fourth group are told that they are participating
in a study of healthy behaviours with weight loss only one component of
interest.
The control group is included here to assess the placebo effect (i.e.,
weight loss due to simply participating in the study).
EXAMPLE 1
A total of twenty patients agree to participate in the study and are
randomly assigned to one of the four diet groups.
Weights are measured at baseline and patients are counselled on
the proper implementation of the assigned diet (with the exception of
the control group).
After 8 weeks, each patient's weight is again measured and the
difference in weights is computed by subtracting the 8 week weight
from the baseline weight.
Positive differences indicate weight losses and negative differences
indicate weight gains.
For interpretation purposes, we refer to the differences in weights as
weight losses and the observed weight losses are shown next slide:
Low
Low Calorie Low Fat Control
Carbohydrate
8 2 3 2
9 4 5 2
6 3 4 -1
7 5 2 0
3 1 3 3
Is there a statistically significant difference in the mean weight loss
among the four diets?
We will run the ANOVA using the five-step approach.
ANOVA USING THE FIVE-STEP APPROACH
Step 1. Set up hypotheses and determine level of
significance
H0: μ1 = μ2 = μ3 = μ4
H1: Means are not all equal- α=0.05
Step 2. Select the appropriate test statistic.
The test statistic is the F statistic for ANOVA,
F=MSB/MSE. (MSB/MSW).
ANOVA USING THE FIVE-STEP APPROACH
ANOVA USING THE FIVE-STEP APPROACH
Source of Sums of Squares Degrees of Mean Squares
Step 4. Compute the test statistic. F
Variation (SS) Freedom (df) (MS)
To organize our computations we complete the ANOVA table
Between
k-1
Treatments
Error (or
N-k
Residual) (within)
Total N-1
ANOVA USING THE FIVE-STEP APPROACH
Low
Low Calorie Low Fat Control
Carbohydrate
In order to compute the sums of squares we must first compute the
sample n means for each
5 group and5 the overall mean
5 based on5the
total
Group sample.
mean 6.6 3.0 3.4 1.2
Example 1
If We Pool All N=20 Observations, The Grand Mean Is = 3.6.
We Can Now Compute:
SO, IN THIS CASE:
Example 1
SSE requires computing the squared differences between each observation
and its group mean. We will compute SSE in parts.
For the participants in the low calorie diet:
Low Calorie (X - 6.6) (X - 6.6)2
8 1.4 2.0
9 2.4 5.8
6 -0.6 0.4
7 0.4 0.2
3 -3.6 13.0
Totals 0 21.4
Example 1
For the participants in the low fat diet:
Low Fat (X - 3.0) (X - 3.0)2
2 -1.0 1.0
4 1.0 1.0
3 0.0 0.0
5 2.0 4.0
1 -2.0 4.0
Totals 0 10.0
EXAMPLE 1
Low Carbohydrate (X - 3.4) (X - 3.4)2
3 -0.4 0.2
5 1.6 2.6
4 0.6 0.4
2 -1.4 2.0
3 -0.4 0.2
Totals 0 5.4
EXAMPLE 1
Control (X - 1.2) (X - 1.2)2
2 0.8 0.6
2 0.8 0.6
-1 -2.2 4.8
0 -1.2 1.4
3 1.8 3.2
Totals 0 10.6
EXAMPLE 1
We can now construct the ANOVA table.
Sums of Degrees of Means
Source of Squares Freedom Squares F
Variation
(SS) (df) (MS)
Between
75.8 4-1=3 75.8/3=25.3 25.3/3.0=8.43
Treatmenst
Error (or
47.4 20-4=16 47.4/16=3.0
Residual)
Total 123.2 20-1=19
EXAMPLE 1
EXAMPLE -2
Calcium is an essential mineral that regulates the heart. It is
important for blood clotting and for building healthy bones.
The National Osteoporosis Foundation recommends a daily calcium
intake of 1000-1200 mg/day for adult men and women. While
calcium is contained in some foods, most adults do not get enough
calcium in their diets and take supplements. Unfortunately, some of
the supplements have side effects such as gastric distress, making
them difficult for some patients to take on a regular basis.
EXAMPLE -2
A study is designed to test whether there is a difference in mean daily
calcium intake in adults with normal bone density, adults with
osteopenia (a low bone density which may lead to osteoporosis) and
adults with osteoporosis. Adults 60 years of age with normal bone
density, osteopenia and osteoporosis are selected at random from
hospital records and invited to participate in the study. Each
participant's daily calcium intake is measured based on reported food
intake and supplements.
The data are shown next slide. Is there a statistically significant
difference in mean calcium intake in patients with normal bone
density as compared to patients with osteopenia and osteoporosis?
We will run the ANOVA using the five-step approach.
Normal Bone Osteopenia Osteoporosis
Density
1200 1000 890
1000 1100 650
980 700 1100
900 800 900
750 500 400
800 700 350
EXAMPLE -2
EXAMPLE -2
Step 4. Compute the test statistic.
To organize our computations we will complete
the ANOVA table. In order to compute the sums of
squares we must first compute the sample means
for each group and the overall mean.
Normal Bone Density Osteopenia Osteoporosis
n1=6 EXAMPLEn -2
=6 2 n3=6
EXAMPLE -2
If we pool all N=18 observations, the grand mean is
817.7.
We can now compute:
Substituting:
Finally,
EXAMPLE -2
Next,
SSE requires computing the squared differences between each
observation and its group mean. We will compute SSE in parts. For
the participants with normal bone density:
Normal Bone (X - 938.3) (X - 938.3)2
Density
1200 261.7 68,486.9
1000 61.7 3,806.9
980 41.7 1,738.9
900 -38.3 1,466.9
750 -188.3 35,456.9
800 -138.3 19,126.9
Total 0 130,083.4
EXAMPLE -2
For participants with osteopenia:
Osteopenia (X - 715.0) (X - 715.0)2
1000 285.0 81,225.0
1100 385.0 148,225.0
700 -15 225.0
800 85.0 7,225.0
500 -215.0 46,225.0
700 -15 225.0
Total 0 283,350.0
EXAMPLE -2
Osteoporosis (X - 715.0) (X - 715.0)2
890 90 8,100.0
650 -150 22,500.0
1100 300 90,000.0
900 100 10,000.0
400 -400 160,000.0
For participants
350 with osteoporosis:
-450 202,500.0
Total 0 493,100.0
Example -2
Source of Sums of Degrees of Mean F
Variation Squares (SS) freedom (df) Squares (MS)
Between 152,429.6 2 76,214.8 1.26
Treatments
Error or 906,533.4 15 60,435.6
Residual
Total 1,058,963.0 17
Example -2
We do not reject H0 because 1.26 < 3.68. We do not have
statistically significant evidence at a =0.05 to show that there
is a difference in mean calcium intake in patients with normal
bone density as compared to osteopenia and osteoporosis.
Example -3
The statistics classroom is divided into three rows: front, middle,
and back
The instructor noticed that the further the students were from him,
the more likely they were to miss class or use an instant
messenger during class
He wanted to see if the students further away did worse on the
exams
Example -3
A random sample of the students in each row was taken
The score for those students on the second exam was recorded
Front: 82, 83, 97, 93, 55, 67, 53
Middle: 83, 78, 68, 61, 77, 54, 69, 51, 63
Back: 38, 59, 55, 66, 45, 52, 52, 61
The summary statistics for the grades of each row are
shown in the table below
Row Front Middle Back
Sample size 7 9 8
Mean 75.71 67.11 53.50
St. Dev 17.63 10.95 8.96
Variance 310.90 119.86 80.29
H :
0 F M B
EXAMPLE -3
k
Grand Mean
n x
x
The grand mean is the average of all the i i
i 1
values when the factor is ignored k
It is a weighted average of the individual n
i 1
i
sample means
nx n x n x
x 1 1 2 2 k k
n n n 1 2 k
One-way ANOVA
Grand Mean for our example is 65.08
7 75.71 9 67.11 8 53.50
x
798
1562
x
24
x 65.08
Example -3
Between Group Variation, SS(B)
The between group variation is the variation between each
sample mean and the grand mean
Each individual variation is weighted by the sample size
SS B n x x
k 2
i i
i 1
SS B n x x n x x n x x
2 2 2
1 1 2 2 k k
Example -3
The Between Group Variation for our example is SS(B)=1902
SS B 7 75.71 65.08 9 67.11 65.08 8 53.50 65.08
2 2 2
SS B 1900.8376 1902
Example -3
Within Group Variation, SS(W)
The Within Group Variation is the weighted total of the
individual variations
The weighting is done with the degrees of freedom
The df for each sample is one less than the sample size for
that sample.
Example -3
Within Group Variation
SS W df s
k
2
i i
i 1
SS W df s df s 2
1 1 2
2
2
df s
k
2
k
Example -3
The within group variation for our example is 3386
SS W 3386.31 3386
SS W 6 310.90 8 119.86 7 80.29
Example -3
After filling in the sum of squares, we have …
Source SS df MS F
Between 1902
Within 3386
Total 5288
Example -3
Degrees of Freedom, df
The between group df is one less than the number of
groups
We have three groups, so df(B) = 2
The within group df is the sum of the individual df’s of
each group
The sample sizes are 7, 9, and 8
df(W) = 6 + 8 + 7 = 21
The total df is one less than the sample size
df(Total) = 24 – 1 = 23
Example -3
Filling in the degrees of freedom gives this …
Source SS df MS F
Between 1902 2
Within 3386 21
Total 5288 23
Example -3
MS(B) = 1902 / 2 = 951.0
MS(W) = 3386 / 21 = 161.2
MS(T) = 5288 / 23 = 229.9
Notice that the MS(Total) is NOT the sum of
MS(Between) and MS(Within).
This works for the sum of squares SS(Total), but
not the mean square MS(Total)
The MS(Total) isn’t usually shown
ONE-WAY ANOVA
Completing the MS gives …
Source SS df MS F
Between 1902 2 951.0
Within 3386 21 161.2
Total 5288 23 229.9
Example -3
Adding F to the table …
Source SS df MS F
Between 1902 2 951.0 5.9
Within 3386 21 161.2
Total 5288 23 229.9
Example -3
Example -3
Completing the table with the p-value
Source SS df MS F
Between 1902 2 951.0 5.9
Within 3386 21 161.2
Total 5288 23 229.9
Example -3
F test is greater than the critical value of F, so we
reject the null hypothesis.
The null hypothesis is that the means of the three
rows in class were the same, but we reject that, so at
least one row has a different mean.
There is enough evidence to support the claim that
there is a difference in the mean scores of the front,
middle, and back rows in class.
The ANOVA doesn’t tell which row is different, you
would need to look at confidence intervals.