UNIT-8
INFERENTIAL STATISTICS: ANOVA
Written By:
Prof. Dr. Nasir Mahmood
Reviewed By:
Dr. Rizwan Akram Rana
Introduction
Analysis of Variance (ANOVA) is a statistical procedure used to test the degree to which
two or more groups vary or differ in an experiment. This unit will give you an insight of
ANOVA, its logic, one-way ANOVA, its assumptions, logic and procedure. F-
distribution, interpretation of F-distribution and multiple procedures will also be
discussed.
Objectives
After reading this unit you will be able to:
1. explain what ANOVA is.
2. write down the logic behind using ANOVA.
3. explain what F-distribution is.
4. explain logic behind one-way ANOVA.
5. explain the assumptions underlying one way ANOVA.
6. explain multiple comparison procedures.
8.1 Introduction to Analysis of Variance (ANOVA)
The t-tests have one very serious limitation – they are restricted to tests of the
significance of the difference between only two groups. There are many times when we
like to see if there are significant differences among three, four, or even more groups. For
example we may want to investigate which of three teaching methods is best for teaching
ninth class algebra. In such case, we cannot use t-test because more than two groups are
involved. To deal with such type of cases one of the most useful techniques in statistics is
analysis of variance (abbreviated as ANOVA). This technique was developed by a British
Statistician Ronald A. Fisher (Dietz & Kalof, 2009; Bartz, 1981)
Analysis of Variance (ANOVA) is a hypothesis testing procedure that is used to evaluate
mean differences between two or more treatments (or population). Like all other
inferential procedures. ANOVA uses sample data to as a basis for drawing general
conclusion about populations. Sometime, it may appear that ANOVA and t-test are two
different ways of doing exactly same thing: testing for mean differences. In some cased
this is true – both tests use sample data to test hypothesis about population mean.
However, ANOVA has much more advantages over t-test. t-tests are used when we have
compare only two groups or variables (one independent and one dependent). On the other
hand ANOVA is used when we have two or more than two independent variables
(treatment). Suppose we want to study the effects of three different models of teaching on
the achievement of students. In this case we have three different samples to be treated
using three different treatments. So ANOVA is the suitable technique to evaluate the
difference.
82
8.1.1 Logic of ANOVA
Let us take a hypothetical data given in the table.
Table 8.1
Hypothetical Data from an Experiment examining learning performance under three
Temperature condition
Treatment 1 Treatment 2 Treatment 3
50o 70o 90o
Sample 1 Sample 2 Sample 3
0 4 1
1 3 2
3 6 2
1 3 0
0 4 0
X=1 X=4 X=1
There are three separate samples, with n = 5 in each sample. The dependent variable is
the number of problems solved correctly
These data represent results of an independent-measure experiment comparing learning
performance under three temperature conditions. The scores are variable and we want to
measure the amount of variability (i.e. the size of difference) to explain where it comes
from. To compare the total variability, we will combine all the scores from all the
separate samples into one group and then obtain one general measure of variability for
the complete experiment. Once we have measured the total variability, we can begin to
break it into separate components. The word analysis means breaking into smaller parts.
Because we are going to analyze the variability, the process is called analysis of variance
(ANOVA). This analysis process divides the total variability into two basic components:
i) Between-Treatment Variance
Variance simply means difference and to calculate the variance is a process of
measuring how big the differences are for a set of numbers. The between-treatment
variance is measuring how much difference exists between the treatment
conditions. In addition to measuring differences between treatments, the overall
goal of ANOVA is to evaluate the differences between treatments. Specifically, the
purpose for the analysis is to distinguish is to distinguish between two alternative
explanations.
a) The differences between the treatments have been caused by the treatment effects.
b) The differences between the treatments are simply due to chance.
Thus, there are always two possible explanations for the variance (difference) that exists
between treatments
1) Treatment Effect: The differences are caused by the treatments. For the data in
table 8.1, the scores in sample 1 are obtained at room temperature of 50o and that of
83
sample 2 at 70o. It is possible that the difference between sample is caused by the
difference in room temperature.
2) Chance: The differences are simply due to chance. It there is no treatment effect,
even then we can expect some difference between samples. The chance differences
are unplanned and unpredictable differences that are not caused or explained by
any action of the researcher. Researchers commonly identify two primary sources
for chance differences.
Individual Differences
Each participant of the study has its own individual characteristics. Although
it is reasonable to expect that different subjects will produce different scores,
it is impossible to predict exactly what the difference will be.
Experimental Error
In any measurement there is a chance of some degree of error. Thus, if a
researcher measures the same individuals twice under same conditions, there is
greater possibility to obtain two different measurements. Often these differences
are unplanned and unpredictable, so they are considered to be by chance.
Thus, when we calculate the between-treatment variance, we are measuring differences
that could be either by treatment effect or could simply be due to chance. In order to
demonstrate that the difference is really a treatment effect, we must establish that the
differences between treatments are bigger than would be expected by chance alone. To
accomplish this goal, we will determine how big the differences is when there is no
treatment effect involved. That is, we will measure how much difference (variance)
occurred by chance. To measure chance differences, we compute the variance within
treatments
ii) Within-Treatment Variance
Within each treatment condition, we have a set of individuals who are treated
exactly the same and the researcher does not do anything that would cause these
individual participants to have different scores. For example, in table 8.1 the data
shows that five individuals were treated at a 70o room temperature. Although, these
five students were all treated exactly the same, there scores are different. Question
is why are the score different? A plain answer is that it is due to chance. Figure 8.1
shows the overall analysis of variance and identifies the sources of variability that
are measures by each of two basic components.
Total Variability
Between Treatment Within Treatment
Variance Variance
Measures Differences Measures Differences
due to: due to:
84
i. Treatment Effect i. Chance
ii. Chance
Fig: 8.1 The independent-measures analysis of variance partition or analyses, the total
variability into two components: variance between treatment and variance within treatment.
8.2 The F-Distribution
After analyzing the total variability into two basic components (between treatment and
within treatment), the next step is to compare them. The comparison is made by
computing a statistics called f-ratio. For independent measure ANOVA, the F-ratio is
calculated using the formula:
F=
F=
The value obtained for F-ratio will help determine whether or not any treatment effects
exist. Consider above stated two possibilities.
1. When the treatment has no effect, then the difference between the treatments will
be entirely due to chance. In this case the numerator and the denominator of F
distribution are both measuring the same chance differences. Then F-ratio should
have a value equal to 1.00. In terms of formula’ we have
F=
=
= 1.00
The F-ratio equal to 1.00 indicates that the differences between treatments are about the
same as the difference expect by chance. So, when F-ratio is equal to 1.00, we will
conclude that there is no evidence to suggest that the treatment has any effect.
2. When the treatment does have an effect, then between-treatments differences
(numerator) should be larger than chance (denominator). In this case numerator of
F-ratio should be considerably larger than the denominator, and we should obtain
F-ratio larger than 1.00. Thus, a large F-ratio indicates that the difference between
are greater than chance; that is the treatment does have a significant effect.
8.2.1 Interpretation of the F-Statistic
The denominator in the F-statistic normalizes our estimate of the variance assuming that Ho
is true. Hence, if F = 2, then our sample has two times as much variance as we would
expect if Ho were true. If F = 10, then our sample has 10 times as much variance as we
would expect if Ho were true. Ten times is quite a bit more variance than we would expect.
In fact, for denominator degrees of freedom larger than 4 and any number of numerator
degrees of freedom, we would reject Ho at the 5% level with an F-statistic of 10.
85
8.3 One Way ANOVA (Logic and Procedure)
The one way analysis of variance (ANOVA) is an extension of independent two-sample t-
test. It is a statistical technique by which we can test if three or more means are equal. It tests
if the value of a single variable differs significantly among three or more level of a factor. We
can also say that one way ANOVA is a procedure of testing hypothesis that K population
means are equal, where K ≥ 2. It compares the means of the samples or groups in order to
make inferences about the population means. Specifically, it tests the null hypothesis:
Ho : µ1 = µ2 = µ3 = ... = µk
Where µ = group mean and k = number of groups
If one way ANOVA yields statistically significant result, we accept the alternate
hypothesis (HA), which states that there are two group means that are statistically
significantly different from each other. Here it should be kept in mind that one way
ANOVA cannot tell which specific groups were statistically significantly different from
each other. To determine which specific groups are different from each other, a
researcher will have to use post hoc test.
As there is only one independent variable or factor in one way ANOVA so it is also
called single factor ANOVA. The independent variable has nominal levels or a few
ordinal levels. Also, there is only one dependent variable and hypotheses are formulated
about the means of the group on dependent variable. The dependent variable
differentiates individuals on some quantitative dimension.
8.3.1 Assumptions Underlying the One Way ANOVA
There are three main assumptions
i) Assumption of Independence
According to this assumption the observations are random and independent
samples from the populations. The null hypothesis actually states that the samples
come from populations that have the same mean. The samples must be random and
independent if they are to be representative of the populations. The value of one
observation is not related to any other observation. In other words, one individual’s
score should not provide any clue as to how any of the other individual should
score. That is, one event does not depend on another.
A lack of assumption of independence leads to most serious consequences. If this
assumption is violated, one way ANOVA will be inappropriate to statistic,
ii) Assumption of Normality
The distributions of the population from which the samples are selected are normal.
This assumption implies that the dependent variable is normally distributed in each
of the groups.
One way ANOVA is considered a robust test against the assumption of normality and
tolerates the violation of this assumption. As regards the normality of grouped data, the
one way ANOVA can tolerate data that is normal (skewed or kurtotic distribution) with
86
only a small effect on I error rate. However, platykurtosis can have profound effect
when group sizes are small. This leaves a researcher with two options:
i) Transform data using various algorithms so that the shape of the distribution
becomes normally distributed. Or
ii) Choose nonparametric Kruskal-Wallis H Test which does not require the
assumption of normality. (This test is available is SPSS).
iii) Assumptions of Homogeneity of Variance
The variances of the distribution in the populations are equal. This assumption
provides that the distribution in the population have the same shapes, means, and
variances; that is, they are the same populations. In other words, the variances on
the dependent variable are equal across the groups.
If assumption of homogeneity of variances has been violated then tow possible tests can
be run.
i) Welch test, or
ii) Brown and Forsythe test
Alternatively, Kruskal-Wallis H Test can also be used. All these tests are available in SPSS.
8.3.2 Logic Behind One Way ANOVA
In order to test pair of sample means differ by more than would be expected by chance,
we might conduct a series of t-tests on K sample means – however, this approach has a
major problem, i.e.
When we use a t-test once, there is a chance of Type I error. The magnitude of this error is
usually 5%. By running two tests on the same data we will have increased his chance of
making error to 10%. For the third administration, it will be 15%, and so on. These are
unacceptable errors. The number of t-tests needed to compare all possible means would be:
( − 1)
2
Where K = Number of means
When more than one t-test is run, each at a specific level of significance such as α = .05,
the probability of making one or more Type I error in a series of t-test is greater than α.
The increased number of Type I error is determined as:
1 – (1 - α) c
Where α = level of significance for each separate t-test
c = number of independent t-test
An ANOVA controls the chance for these errors so that the type I error remains at 5%
and a researcher can become more confident about the results.
8.3.3 Procedure for Using ANOVA
In using ANOVA manually we need first to compute a total sum of squares (SS total) and
then partition this value into two components: between treatments and within treatments.
This analysis is outlined in Fig 8.2
87
SS Total
2
= ∑X 2 −
SS between Treatments SS within Treatments
∑ SS inside each treatment
Fig: 3 partitioning the total sum of square (SS Total) for the independent measure ANOVA
1) The Total Sum of Squares (SS Total)
It is the total sum of square for the entire set of N scores. It can be calculated using
computational formula for SS:
(∑ )
SS Total = ∑X −
But (∑X)2 = G2 then
SS Total = ∑X −
2) Sum of Squares within Treatments (SS Within)
The sum of square inside each treatment can be calculated as:
SS within = SS1 + SS2 + … + SSn
= ∑ SS Inside each treatment
3) Sum of Squares Between Treatments (SS Between)
The computational formula for SS Between is as:
SS Between = ∑ −
Now
SS Total = SS Between + SS Within
8.4 Multiple Comparison Procedure
In one-way ANOVA “R2” measures the effect size, it suffers one possible limitation – it
does not indicate which group may be the responsible for a significant effect. All that a
significant R2 and F statistic say is that the means for the groups are unlikely to have
been sampled from a single hat of means. Unfortunately, there is no simple, unequivocal
statistical solution to the problem of comparing for different levels of an ANOVA factor.
A number of statistical methods have been developed to test for the difference in means
among the levels of an ANOVA factor. Collectively these are known as multiple
88
comparison procedures (MCPs) or sometimes, as post hoc (i.e. after the fact) tests. These
tests should be used regarded as an afterthought than a rigorous examination of pre-
specified hypotheses.
Most of the multiple-comparisons methods are meant to pair-wise comparisons of group
means, to determine which are significantly from which others. The main purpose of
most multiple-comparison procedures is to control the overall significance level, for some
set of interferences performed as a follow-up to ANOVA. This overall significance level
is the probability, conditional on all the null hypotheses being tested being true, of
rejecting at least one of them, or equivalently, of having at least one confidence interval
not include the true value.
The various methods differ in how well they properly control the overall significance level
and in their relative power. Commonly used method sand their relative power is given below.
Bonferroni – It is extremely general and simple, but often not powerful.
Tucky’s – It is the best of all possible pair-wise comparisons when sample sizes are
unequal or confidence intervals are needed. It is also very good even with equal
sample sizes without confidence intervals.
Stepdown – It is the most powerful for all possible pair-wise comparisons when
sample sizes are equal.
Dunnett’s – It is suitable for comparing one sample to each of the others, but not
comparing the others to each other.
Hsu’s MCB – It compares each mean to the best of the other means.
Scheffè’s – It is suitable for unplanned contrasts among sets of means.
8.5 Self Assessment Questions
Q. 1 When will you use ANOVA in your research?
Q. 2 Write down the logic behind using ANOVA.
Q. 3 Write a short note on one way ANOVA.
Q. 4 Write down main assumptions underlying one way ANOVA.
Q. 5 What are multiple comparison procedures?
Q. 6 What is the basic purpose of multiple comparison procedures?
8.6 Activities
1. Suppose you have to see the difference between three groups. Discuss with your
colleague and select appropriate statistical test.
2. In your study, the treatment you used had no effect. What will be the F-ratio?
89
8.7 Bibliography
Deitz, T., & Kalof, L. (2009). Introduction to Social Statistics. UK: Wiley_-Blackwell
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to Design and Evaluate in
Education. (8th Ed.) McGraw-Hill, New York
Pallant, J. (2005). SPSS Survival Manual – A Step by Step Guide to Data Analysis Using
SPSS for Windows (Version 12). Australia: Allen & Unwin.
90