0% found this document useful (0 votes)
42 views98 pages

Module 3

The document discusses the analysis of variance (ANOVA) technique for analyzing single-factor experiments. It outlines the basics of one-way or single-factor ANOVA, including calculating the sum of squares between and within samples, determining the F-ratio to test for significant differences between means, and using Fisher's least significant difference (LSD) method for multiple comparisons following a significant ANOVA. An example experiment and calculations are provided to illustrate a one-way ANOVA analysis.

Uploaded by

mesfin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
42 views98 pages

Module 3

The document discusses the analysis of variance (ANOVA) technique for analyzing single-factor experiments. It outlines the basics of one-way or single-factor ANOVA, including calculating the sum of squares between and within samples, determining the F-ratio to test for significant differences between means, and using Fisher's least significant difference (LSD) method for multiple comparisons following a significant ANOVA. An example experiment and calculations are provided to illustrate a one-way ANOVA analysis.

Uploaded by

mesfin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 98

DESIGN AND ANALYSIS OF

SINGLE-FACTOR EXPERIMENTS:
THE ANALYSIS OF VARIANCE

Delivered to: CCIIDI, Ministry of Industry

By
Dr. Beteley Tekola (Assi. Professor)
Associate Director for Research, AAiT
School of Chemical and Bio Engineering
Addis Ababa Institute of Technology
OUTLINE
 Introduction
 Introduction to analysis of variance (ANOVA)

 One way (single factor) analysis of variance

 Multiple comparisons (Fisher’s LSD method)

 Model adequacy checking

 Regression model analysis

 Adequacy of the regression model


INTRODUCTION
 Experiments are a natural part of the engineering and
scientific decision-making process.

 If there are only two curing methods of interest, this


experiment could be designed and analyzed using the
statistical hypothesis methods.

 Many single-factor experiments require that more than


two levels of the factor be considered
INTRODUCTION

The a factor levels in the experiment could have been


chosen in two different ways.
1- Fixed-effects model
the experimenter could have specifically chosen the a
treatments. In this situation, we wish to test hypotheses
about the treatment means, and
conclusions cannot be extended to similar treatments that
were not considered.
In addition, we may wish to estimate the treatment
effects.
INTRODUCTION

2. Random effects (components of variance) model


the a treatments could be a random sample from a larger
population of treatments.
In this situation, we would like to be able to extend the
conclusions (which are based on the sample of
treatments) to all treatments in the population, whether
or not they were explicitly considered in the experiment.
 Here the treatment effects τi are random variables,
and knowledge about the particular ones investigated
is relatively unimportant.
 Instead, we test hypotheses about the variability of the
τi and try to estimate this variability.
ANOVA
 Analysis of variance (ANOVA) is an extremely useful
technique concerning researches in the fields of
engineering, economics, biology, education,
psychology, sociology, business/industry and in
researches of several other disciplines.
 This technique is used when multiple sample cases
are involved.
 As stated earlier, the significance of the difference
between the means of two samples can be judged
through either z-test or the t-test, but the difficulty
arises when we happen to examine the significance of
the difference amongst more than two sample means
at the same time.
ANOVA
 The ANOVA technique enables us to perform this
simultaneous test and as such is considered to be an
important tool of analysis in the hands of a
researcher.
 Using this technique, one can draw inferences about
whether the samples have been drawn from
populations having the same mean.
 The essence of ANOVA is that the total amount of
variation in a set of data is broken down into two
types,
 that amount which can be attributed to chance and
 that amount which can be attributed to specified causes
THE BASIC PRINCIPLE OF ANOVA
The basic principle of ANOVA is to test for
differences among the means of the populations by
examining the amount of variations
within each of these samples RELATIVE TO between the
samples.
The ANOVA assumes that the observations are
normally and independently distributed with the same
variance for each treatment (factor level)
ANOVA
 One-way (or single factor) ANOVA
 Two-way (or multi factor) ANOVA
Fixed-effects model

ONE-WAY (SINGLE FACTOR) ANOVA


 One-way (or single factor) ANOVA: Under the one-way ANOVA,
we consider only one factor and then observe that the reason for
said factor to be important is that several possible types of
samples can occur within that factor. We then determine if there
are differences within that factor.
 Typical Data for a Single-Factor Experiment
Fixed-effects model

EXAMPLE
 We may describe the single factor observations by the linear
statistical model

Where
Yij is a random variable denoting the (ij)th observation,
µ is a parameter common to all treatments called the overall mean,
τi is a parameter associated with the ith treatment (ith treatment effect),
Єij is a random error component.
 Display of the model for the completely randomized single-factor
experiment.
Fixed-effects model
ONE-WAY (SINGLE FACTOR) ANOVA
TECHNIQUE

 The technique involves the following steps:


(i) Obtain the mean of each sample i.e., obtain
 X1, X2, X3, ..., Xk
 when there are k samples.
Fixed-effects model
ONE-WAY (SINGLE FACTOR) ANOVA
TECHNIQUE
(ii) Work out the mean of the sample means as follows:

(iii) Calculate Sum of Squares for variance between the


samples (or SS between): take the deviations of the
sample means from the mean of the sample means and
calculate the square of such deviations which may be
multiplied by the number of items in the corresponding
sample, and then obtain their total. Symbolically, this
can be written:
ONE-WAY (SINGLE FACTOR) ANOVA
TECHNIQUE
iv) Divide the result of the (iii) step by the degrees of
freedom between the samples to obtain variance or
mean square (MS) between samples. Symbolically, this
can be written:

 where (k – 1) represents degrees of freedom (d.f.)


between samples
ONE-WAY (SINGLE FACTOR) ANOVA
TECHNIQUE
(v) Calculate the sum of squares for variance
within samples (or SS within): Obtain the
deviations of the values of the sample items for
all the samples from corresponding means of
the samples and calculate the squares of such
deviations and then obtain their total.
Symbolically this can be written:
ONE-WAY (SINGLE FACTOR) ANOVA
TECHNIQUE
(vi) Divide the result of (v) step by the degrees of freedom
within samples to obtain the variance or mean square
(MS) within samples. Symbolically, this can be written:

 where (n – k) represents degrees of freedom within


samples,
 n = total number of items in all the samples i.e., n1 +
n2 + … + nk
 k = number of samples.
ONE-WAY (SINGLE FACTOR) ANOVA
TECHNIQUE
(vii) For a check, the sum of squares of deviations for
total variance can also be worked out by adding the
squares of deviations when the deviations for the
individual items in all the samples have been taken from
the mean of the sample means. Symbolically, this can be
written:

 This total should be equal to the total of the result of


the (iii) and (v) steps explained above i.e.,
ONE-WAY (SINGLE FACTOR) ANOVA
TECHNIQUE
 The degrees of freedom for total variance will be equal
to the number of items in all samples minus one i.e., (n
– 1). The degrees of Freedom for between and within
must add up to the degrees of freedom for total variance
i.e.,

 This fact explains the additive property of the ANOVA


technique.
ONE-WAY (SINGLE FACTOR) ANOVA
TECHNIQUE
(viii) Finally, F-ratio may be worked out as under:

 This ratio is used to judge whether the difference


among several sample means is significant or is just a
matter of sampling fluctuations. For this purpose we
look into the table*, giving the values of F for given
degrees of freedom at different levels of significance.
ONE-WAY (SINGLE FACTOR) ANOVA
TECHNIQUE
 If the worked out value of F, as stated above, is less than
the table value of F, the difference is taken as
insignificant i.e., due to chance and the null-hypothesis
of no difference between sample means stands.
 In case the calculated value of F happens to be either
equal or more than its table value, the difference is
considered as significant (which means the samples
could not have come from the same universe) and
accordingly the conclusion may be drawn.
 The higher the calculated value of F is above the table
value, the more definite and sure one can be about his
conclusions.
TABLE FOR ONE-WAY ANOVA (K SAMPLES HAVING IN
ALL N ITEMS)
EXAMPLE1
 A manufacturer of paper used for making grocery bags is
interested in improving the tensile strength of the product.
Product engineering thinks that tensile strength is a
function of the hardwood concentration in the pulp and that
the range of hardwood concentrations of practical interest is
between 5 and 20%. A team of engineers responsible for the
study decides to investigate four levels of hardwood
concentration: 5%, 10%, 15%, and 20%. They decide to
make up six test specimens at each concentration level,
using a pilot plant. All 24 specimens are tested on a
laboratory tensile tester, in random order.
 The engineer wishes to test the hypothesis that different
hardwood concentrations do not affect the mean tensile
strength of the paper at 0.05 of .
 Ho: µ1= µ2= µ3= µ4= µ5
H1: some means are different. Or
EXAMPLE 1
y. y..
y. y..
EXAMPLE1
EXAMPLE1

 Since f0.01,3,20 is 4.94, we reject H0 and conclude that


hardwood concentration in the pulp significantly
affects the mean strength of the paper.
 We can also find a P-value for this test statistic as
follows:

 Since P is considerably smaller than 0.01 of , we


have strong evidence to conclude that H0 is not true.
MULTIPLE COMPARISONS FOLLOWING THE
ANOVA - LSD
MULTIPLE COMPARISONS FOLLOWING
THE ANOVA - LSD

 When the null hypothesis is rejected in the ANOVA, we


know that some of the treatment or factor level means
are different.
 However, the ANOVA doesn’t identify which means are
different. Methods for investigating this issue are
called multiple comparisons methods.
 Fisher’s least significant difference (LSD)
method.
MULTIPLE COMPARISONS FOLLOWING
THE ANOVA - LSD

 The Fisher LSD method compares all pairs of means


with the null hypotheses Ho: µi= µj (for all i different
from j) using the t-statistic

 Assuming a two-sided alternative hypothesis, the pair


of means i and j would be declared significantly
different if;
ADEQUACY OF THE MODEL
MODEL ADEQUACY CHECKING
The analysis of variance assumes that the observations are
normally and independently distributed with the same
variance for each treatment or factor level.
 These assumptions should be checked by examining the
residuals.
 A residual is the difference between an observation yij and
its estimated (or fitted) value from the statistical model
being studied, denoted as .

 Using to calculate each residual essentially removes the


effect of hardwood concentration from the data;
consequently, the residuals contain information about
unexplained variability
MODEL ADEQUACY CHECKING
 Examination of the residuals should be an automatic
part of any analysis of variance.

If the model is adequate, the residuals should be


structureless; that is, they should contain no obvious
patterns.

 Through a study of residuals, many types of model


inadequacies and violations of the underlying
assumptions can be discovered
MODEL ADEQUACY CHECKING

 As an approximate check of normality, the experimenter can construct


a frequency histogram of the residuals or a normal probability
plot of residuals.
 Many computer programs will produce a normal probability plot of
residuals, and since the sample sizes in regression are often too small
for a histogram to be meaningful, the normal probability plotting
method
 The normality assumption can be checked by constructing a normal
probability plot of the residuals.
MODEL ADEQUACY CHECKING
The Normality Assumption
MODEL ADEQUACY CHECKING
The Normality Assumption
 To check the assumption of equal variances at each factor level, plot the residuals
against the factor levels and compare the spread in the residuals.
 It is also useful to plot the residuals against (sometimes called the fitted value); the
variability in the residuals should not depend in any way on the value of .
MODEL ADEQUACY CHECKING
 Most statistics software packages will construct these
plots on request.
PRACTICAL INTERPRETATION OF
RESULTS
 After
 conducting the experiment,
 performing the statistical analysis, and
 investigating the underlying assumptions,

 the experimenter is ready to draw practical conclusions


about the problem he or she is studying.
 Often this is relatively easy, and certainly in the simple
experiments we have considered so far, this might be
done somewhat informally, perhaps by inspection of
graphical displays.
 However, in some cases, more formal techniques need to
be applied.
REGRESSION MODEL
REGRESSION MODEL
 Many problems in engineering and science involve
exploring the relationships between two or more
variables.
Regression analysis is a statistical technique that is
very useful for these types of problems.
 With a quantitative factor such as time, the
experimenter is usually interested in the entire range
of values used, particularly the response from a
subsequent run at an intermediate factor level.
 That is, if the levels 1.0,2.0, and 3.0 hours are used in
the experiment, we may wish to predict the response at
2.5 hours.
REGRESSION MODEL
 Thus, the experimenter is frequently interested in
developing an interpolation equation for the response
variable in the experiment.
 This equation is an empirical model of the process
that has been studied.
 The general approach to fitting empirical models is
called regression analysis.
SIMPLE LINEAR REGRESSION MODEL
Simple linear regression Model
Suppose the purity of oxygen (Y) produced in a chemical
distillation process, depends on the percentage of
hydrocarbons that are present in the main condenser of the
distillation unit(x)

The estimates of β0 and β1


should result in a line
that is a “best fit” to the
data
MULTIPLE REGRESSION MODEL

Multiple regression model


 Many applications of regression analysis involve
situations in which there are more than one regressor
variable.
 Example: suppose that the effective life of a cutting
tool depends on the cutting speed and the tool angle.
A multiple regression model that might describe this
relationship is
The estimates of β0 and
β1 and β2, should result
in a graph that is a
“best fit” to the data
MULTIPLE REGRESSION MODEL
 In general, the dependent variable or response Y may
be related to k independent or regressor variables, x.
The model

is called a multiple linear regression model with k regressor


variables.
 The parameters βj, j=0,1,…, k, are called the regression
coefficients.
 Multiple linear regression models are often used as
approximating functions. That is, the true functional
relationship between Y and x1, x2,…, xk is unknown, but
over certain ranges of the independent variables the
linear regression model is an adequate approximation.
MULTIPLE REGRESSION MODEL

 Models that are more complex in structure than


above equation may often still be analyzed by
multiple linear regression techniques.
 For example, consider the cubic polynomial
model in one regressor variable

 which is a multiple linear regression model with


three regressor variables
MULTIPLE REGRESSION MODEL
 Models that include interaction effects may also be
analyzed by multiple linear regression methods.
 Interaction implies that the effect produced by changing
one variable (x1, say) depends on the level of the other
variable (x2)
 An interaction between two variables can be represented
by a cross-product term in the model, such as

which is a linear regression model.


MULTIPLE REGRESSION MODEL

 Consider the second-order model with interaction


MULTIPLE REGRESSION MODEL
 The open circles on the
graph are the mean tensile
strengths at each value of
cotton weight percentage x.
 From examining the scatter
diagram, it is clear that the
relationship between tensile
strength and weight
percentage cotton is not
linear.
 As a first approximation, we
could try fitting a quadratic
model to the data, say
HYPOTHESIS TESTS IN SIMPLE
LINEAR REGRESSION
 As in the simple linear regression case, hypothesis
testing requires that the error terms εi in the
regression model are normally and independently
distributed with mean zero and variance 2.
 Suppose we wish to test the hypothesis that the β1
slope equals a constant, say, β1,0 . Constant can be 0.
 A similar procedure can be used to test hypotheses
about the intercept
 The appropriate hypotheses are;
HYPOTHESIS TESTS IN SIMPLE
LINEAR REGRESSION

 The test statistic for H0 is

We should reject H0 if the computed value of the test


statistic in f0, and we would reject H0 if f0 is greater
than f , k, n-p.
 Failure to reject H0 is equivalent to concluding that
there is no linear relationship between x and Y.
HYPOTHESIS TESTS IN SIMPLE
LINEAR REGRESSION
 The hypothesis H0: β1=0 is not rejected.

x is of little value in explaining the that the true relationship between


variation in Y x and Y is not linear
HYPOTHESIS TESTS IN SIMPLE
LINEAR REGRESSION
 The hypothesis H0: β1=0 is rejected.

-The straight-line model is adequate or


-although there is a linear effect of x,
x is of value in explaining the better results could be obtained with the
variability in Y addition of higher order polynomial
terms in x
TEST FOR SIGNIFICANCE OF REGRESSION
 In multiple linear regression problems, certain tests of
hypotheses about the model parameters are useful in
measuring model adequacy.
 The test for significance of regression is a test to determine
whether a linear relationship exists between the response
variable y and a subset of the regressor variables x1, x2,…,
xk.
 The appropriate analysis are

 Rejection of Ho implies that at least one of the regressor


variables x1, x2,…, xk contributes significantly to the model
HYPOTHESIS TESTS IN MULTIPLE
LINEAR REGRESSION
 The test statistic for H0 is

 We should reject H0 if the computed value of the


test statistic in f0, is greater than f , k, n-p.
TESTS ON INDIVIDUAL REGRESSION
COEFFICIENTS AND SUBSETS OF COEFFICIENTS
 We are frequently interested in testing hypotheses on
the individual regression coefficients.

 Such tests would be useful in determining the


potential value of each of the regressor variables in the
regression model.

 For example, the model might be more effective with


 the inclusion of additional variables or
 perhaps with the deletion of one or more of the regressors
presently in the model.
TESTS ON INDIVIDUAL REGRESSION
COEFFICIENTS AND SUBSETS OF COEFFICIENTS
 The hypotheses for testing the significance of any
individual regression coefficient, say βj, are

 If H0: βj=0 is not rejected, this indicates that the


regressor xj can be deleted from the model.
 If H0: βj=0 is rejected, this indicates that the variable
Xj contributes significantly to the model. The test
statistic for this hypothesis is
The null hypothesis Ho
is rejected if
ADEQUACY OF THE REGRESSION
MODEL
 Residual Analysis
 Coefficient of Determination(R2)
ADEQUACY OF THE REGRESSION
MODEL
 Assumptions
 Fitting a regression model requires several assumptions.
Estimation of the model parameters requires the
assumption that the errors are uncorrelated random
variables with mean zero and constant variance.
 Tests of hypotheses and interval estimation require that the
errors be normally distributed.
 In addition, we assume that the order of the model is
correct; that is, if we fit a simple linear regression model, we
are assuming that the phenomenon actually behaves in a
linear or first-order manner.
 The analyst should always consider the validity of these
assumptions to be doubtful and conduct analyses to examine
the adequacy of the model that has been tentatively
entertained.
RESIDUAL ANALYSIS
 The residuals from a regression model are ,

where yi is an actual observation and ŷi is the


corresponding fitted value from the regression model.
 Analysis of the residuals is frequently helpful in checking
the assumption that the errors are approximately
normally distributed with constant variance, and in
determining whether additional terms in the model
would be useful.
 As an approximate check of normality, the experimenter
can construct a frequency histogram of the residuals or a
normal probability plot of residuals.
RESIDUAL ANALYSIS
 It is frequently helpful to
plot the residuals
(1) in time sequence (If
known
(2) against the
(3) against the independent
variable x.

 Patterns for residual


plots.
(a) satisfactory,
(b) funnel,
(c) double bow,
(d) nonlinear.

indicate inequality of variance indicate model inadequacy


COEFFICIENT OF DETERMINATION(R2)
 Coefficient of determination is often used to judge
the adequacy of a regression model.

 Subsequently, we will see that in the case where X and


Y are jointly distributed random variables, R2 is the
square of the correlation coefficient between X and Y.

 The statistic R2 should be used with caution, because it


is always possible to make R2 unity by simply adding
enough terms to the model.
 For example, we can obtain a “perfect” fit to n data points
with a polynomial of degree n-1.
TRANSFORMATIONS TO A STRAIGHT
LINE
 We occasionally find that the straight-line regression
model

is inappropriate because the true regression function is


nonlinear.
 Sometimes nonlinearity is visually determined from the
scatter diagram, and sometimes, because of prior
experience or underlying theory, we know in advance that
the model is nonlinear.
 Occasionally, a scatter diagram will exhibit an apparent
nonlinear relationship between Y and x.
 In some of these situations, a nonlinear function can be
expressed as a straight line by using a suitable
transformation.
 Such nonlinear models are called intrinsically linear
TRANSFORMATIONS TO A STRAIGHT
LINE
 As an example of a nonlinear model that is
intrinsically linear, consider the exponential function

 This function is intrinsically linear, since it can be


transformed to a straight line by a logarithmic
transformation
TRANSFORMATIONS TO A STRAIGHT
LINE
 Reciprocal transformation

 By using the reciprocal transformation Z=1/x, the


model is linearized to
DESIGN OF EXPERIMENTS WITH
SEVERAL FACTORS

Delivered to: CCIIDI, Ministry of Industry

By
Dr. Beteley Tekola (Assi. Professor)
Associate Director for Research, AAiT
School of Chemical and Bio Engineering
Addis Ababa Institute of Technology
FACTORIAL DESIGN
 Many experiments involve the study of the effects of
two or more factors.
 In general, factorial designs are most efficient for
this type of experiment.
 By a factorial design, we mean that in each
complete trial or replication of the experiment all
possible combinations of the levels of the factors are
investigated.
 For example, if there are a levels of factor A and b
levels of factor B, each replicate contains all ab
treatment combinations.
 When factors are arranged in a factorial design,
they are often said to be crossed.
FACTORIAL EXPERIMENTS
 If there are two factors A and B with a levels of factor
A and b levels of factor B, each replicate contains all ab
treatment combinations.
 The effect of a factor is defined as the change in
response produced by a change in the level of the
factor.
 It is called a main effect because it refers to the
primary factors in the study.
FACTORIAL EXPERIMENTS
 This is a factorial experiment with two factors, A and B, each at
two levels (Alow, Ahigh, and Blow, Bhigh).
 The main effect of factor A is the difference between the average
response at the high level of A and the average response at the
low level of A, or

 That is, changing factor A from the low level to the high level
causes an average response increase of 20 units.
 Similarly, the main effect of B is
FACTORIAL EXPERIMENTS
DESIGN OF EXPERIMENTS (DOE)
IN MANUFACTURING INDUSTRIES
 Statistical methodology for systematically
investigating a system's input-output
relationship to achieve one of several goals:
 Identify important design variables (screening)
 Optimize product or process design
 Achieve robust performance

 Key technology in product and process


development

Used extensively in manufacturing industries


Part of basic training programs such as Six-sigma
DESIGN AND ANALYSIS OF EXPERIMENTS
A HISTORICAL OVERVIEW
 Factorial and fractional factorial designs (1920+)
→ Agriculture

 Sequential designs (1940+) → Defense

 Response surface designs for process


optimization (1950+) → Chemical

 Robust parameter design for variation reduction


(1970+)
→ Manufacturing and Quality Improvement

 Virtual (computer) experiments using


computational models (1990+)
→ Automotive, Semiconductor, Aircraft, …
(FULL) FACTORIAL DESIGNS
 All possible combinations

 General: IxJxK…

 Two-level designs: 2 x 2, 2 x 2 x 2, … →
(FULL) FACTORIAL DESIGNS
 All possible combinations of the factor settings

 Two-level designs: 2 x 2 x 2 …

 General: I x J x K … combinations
Will focus on
two-level designs

OK in screening phase
i.e., identifying
important factors
(FULL) FACTORIAL DESIGNS
 All possible combinations of the factor settings

 Two-level designs: 2 x 2 x 2 …

 General: I x J x K … combinations
Full Factorial Design
9.5

5.5
Algebra
-1 x -1 = +1

Design Matrix

Full Factorial Design


7

9+9+3+3 7+9+8+8
6 8

6 – 8 = -2
TWO FACTOR MODEL

 The observations of a simplest two factor model may


be described by the linear statistical model

 where µ is the overall mean effect,


 τi is the effect of the i th level of factor A,
 βj is the effect of the jth level of factor B,
 (τβ)ij is the effect of the interaction between A and B, and
 εijk is a random error component having a normal
distribution
TWO FACTOR MODEL

 The analysis of variance (ANOVA), in particular,


will continue to be used as one of the primary tools for
statistical data analysis.
 We are interested in testing the hypotheses of
 no main effect for factor A,
 no main effect for B, and
 no AB interaction effect
EXAMPLE

 ANOVA Table for a two-factorial, fixed effect model


EXAMPLE 1
 Aircraft primer paints are applied to aluminum
surfaces by two methods: dipping and spraying. The
purpose of the primer is to improve paint adhesion,
and some parts can be primed using either
application method. The process engineering group
responsible for this operation is interested in
learning whether three different primers differ in
their adhesion properties.
 A factorial experiment was performed to investigate
the effect of paint primer type and application
method on paint adhesion. For each combination of
primer type and application method, three specimens
were painted, then a finish paint was applied, and
the adhesion force was measured.
EXAMPLE
REGRESSION MODEL
 Suppose that both of our design factors are
quantitative (such as temperature, pressure, time,
etc.), then a regression model representation of
the two-factor factorial experiment could be written
as

where y is the response, the B are parameters whose


values are to be determined, X1 is a variable that
represents factor A, X2 is a variable that represents
factor B, and E is a random error term, and X1X2
represents the interaction between X1 and X2

You might also like