Formative Evaluation:
What, Why, When, and How
Abstract
This paper is going to describe the formative evaluation by comparing two books of
Tessmer and Flagg. One can employ Tessmer stages of doing formative evaluation
(experts reviews, one-to-one, small group, and field test) in instructional design or
Flaggs’ steps (pre-production, production, and implementation) in instructional
technologies. This paper recommends Tessmer’s book to be used for the beginners
because its detail explanation of how to do, plan, collect data, and analyze it.
A. Background
Having no methods of its own, evaluation has always borrowed strategies from all
the social sciences (Krathwohl, 1997). Evaluation can evaluate almost anything,
such as a person, a curriculum, a student, a process, a product, or a program. Every
evaluation conducted by some experts has its own name such as personal
evaluation, program evaluation, program evaluation, or teaching evaluation.
Evaluation also differs from research not by its methods but by other aspects, such
as decision-driven (to failitate making decision) and utilization (usefulness of the
process). The intent of evaluation is to reduce the uncertainty and to provide an
imformation-rich decision making environment. Comparing to research, evaluation
gives a better information basis for action (decision).
Evaluation is variously conceived as a tool for more effective program management
or means of empowerment for those affected programs, or an effort to be responsive
to the concerns of stakeholders, or directed by a highly competent professional
opinion, or a means to conclusions and recommendation, or a process of negotiation
among stakeholders to produce an agenda for further negotiation. Some experts see
evaluation as the responsibility of connoissearial judgements by an areas’ experts,
best be done as naturalistic descriptive qualitative research or embedded in
measurement and experimentation (Krathwohl, 1997).
Evaluation is oriented primarily toward gathering information that will facilitate
improving a person, a curriculum, a student, a process, a program or a product
(formative) or that will help determining its value or worth (summative). Many experts
have analyzed the difference between formative and summative evaluation. Markle
(1989, in Tessmer, 1993) mentioned that summative evaluation is an evaluation to
prove but formative evaluation is an evaluation to improve the programs or the
products. Baker and Alkin (1973, in Tessmer, 1993) stated that the difference
between the two evaluations as between evaluation for validation (summative)
versus evaluation for revision (formative). Schriven (1991, in Krathwohl, 1997)
quoted Robert Stake on the formative and summative distinction, "When the cook
tastes the soup, that is formative evaluation; when the guest tastes it, that is
summative evaluation".
This article is going to describe thoroughly formative evaluation as a tool of
improving instructional programs, products, and materials. By comparing two books
of formative evaluation, i.e. "Planning and conducting formative evaluation" by Martin
Tessmer (1993) and "Formative evaluation for educational technologies" by Barbara
N. Flagg (1990), this paper will summarize and analyze the following questions:
1. What is formative evaluation?
2. When is evaluation conducted, why and why not?
3. What stages of formative evaluation are there?
4. How to plan formative evaluation?
5. What are the remarks and critical issues in conducting formative evaluation?
This paper will give recommendation for using the two formative evaluation books.
This article will focus on answering those questions and describe the strengths and
weaknesses by analyzing the books critically and by giving opinions and reasons
whether the books valuable or useable.
A. Definition
Tessmer (1993) explicitly defined formative evaluation as a judgement (of the
strengths and weaknesses of instruction in its developing stages) for purposes of
revising the instruction to improve its effectiveness and appeal. Defining solely that
evaluation is a process of gathering data to determine the worth or value of
instruction, of its strengths and weaknesses. The evaluation is conducted by
collecting data about the instruction from variety of sources, using a variety of data
gathering methods and tools. The readers should understand that the process of
gathering data is very important since the formative evaluation is a judgement for
improving the effectiveness of the instruction (products, programs, or materials).
Flagg (1990) does not give an explicit definition of formative evaluation. She refers
formative evaluation as the process of gathering information to advise design,
production, and implementation. Discussing three case studies ("Sesame Street",
"The Business Disc", and "Puppet Theater"), the book explained the phases (design,
production, and implementation) of formative evaluation in each case. Explicitly the
book also mentioned that formative evaluation is valuable in decision-making
process during the design of computer software and videodisc, is working in
production settings, and is facilitating decision in implementation process.
These two definitions describe that formative evaluation is a process of collecting
data to be used to judge the strengths and weaknesses of instructional in order to
revise and improve the programs, products, and materials. This judgement is a
guideline for the researcher to improve the quality, effectiveness and efficiency of the
programs, products, and materials. It also can be used to make decision whether the
programs, products, and materials should be continued or cancelled, revised or
changed, improved or destroyed. Both books consider formative evaluation as an
important step and one should understand that continuation of the programs,
products, and materials depends on the result of formative evaluation.
Scriven (1967, in Tessmer, 1993 and Flagg, 1990) attached the name "formative
evaluation" to a revision process that referred to an outcome evaluation of an
intermediate stage in the development of the teaching instrument. Using the same
name given by Scriven, Tessmer (1993) mentioned other names of formative
evaluation from other experts, i.e. "try out", "developmental testing", "pilot study",
"formative assessment", "dry run", "alpha/beta testing", "quality control", and "learner
verification and revision". Tessmer preferred to use "quality control" as formative
evaluation but since "quality control" does not describe and represent the actual
meaning of formative evaluation and the actual target who is going to judge the
effectiveness and quality of the products, "learner verification and revision" is a better
name for formative evaluation. But what is in a name? For formative evaluation, the
process of conducting it is the most important thing to be planned and conducted
thoroughly.
Flagg (1990) gave no specific name for formative evaluation and mentioned no
reason why the name is included in each phase of evaluation, such as pre-
production formative evaluation, production formative evaluation, and
implementation formative evaluation. The names are referred to the collection of
information to guide decisions during the design, production, and implementation
phase respectively.
B. When, why and why not
Flagg (1990) mentioned the only reason for performing formative evaluation is that to
inform the decision-making process during the design, production and
implementation process. To understand the content, attitudes toward the content,
interests in the content, and learners’ experience with the medium in design phase,
to reduce expensive mistakes and improve user friendliness in production phase, to
restructure the products for different settings in implementation phase are the main
reasons why formative evaluation needed. But particularly, formative evaluation is
warranted for the novice designers, for the implementation of new content, for the
application of the new technologies, for the different target learners, for the unfamiliar
strategies, for the accurateness of critical performance, for the large quantity of
dissemination, and for the little chance of revisions given (Tessmer, 1993).
Considering the importance of formative evaluation and by analyzing several studies,
Tessmer stated that there were three main results of doing formative evaluation in
instruction. First, using formative evaluation in all types of instruction (computer-
based, simulation, games, texts, and multi media) can improve the learning
effectiveness of materials. Second, even though there is not enough evidence of
whether the instruction is more interesting or motivating, formative evaluation can be
used to obtain criticism and suggestions on interest or motivation of instruction to its
users. Third, since practitioners used some types of formative evaluation (no specific
names) in their projects, formative evaluation appears to be part of the "real world" of
instructional design.
By considering and explaining three case studies, Flagg (1990) perform the need of
formative evaluation to inform the decision-making process during their design,
production, and implementation stages of educational program with the purpose of
improving the program. The Sesame Street staff conduct research on children’s
conceptual understanding of death in order to provide information useful in scripting
the program. In production phase, user observation gave producers of The Business
Disc feedback to improve user friendliness before the instructional program reached
a stage where changes were cost prohibitive. Formative evaluation gives feedback
to developer of Puppet Theater to reconfigure a program for different context and
users. They reworked the program and added tools in response to formative data
from other users.
Eventhough formative evaluation is frequently used by practitioners, most of
organizations did not accept the formative evaluation as a part of their program.
They do not understand the purpose or utility of formative evaluation because they
think that formative evaluation is only for the finishing product evaluation, for
incompetent/inexperience designers, and for the insufficient personal evaluators.
Flagg also mentioned that there are six reasons why formative evaluation is not
given in the development of educational materials in electronic technologies. The
major excuses are ones of time (under pressure to produce by certain deadlines),
money (small percentage of production budget), human nature (constraint on
creativity), unmet expectations (unrealistic expectations), measurement difficulties
(difficult to measure long term objectives), and lack of knowledge (unaware of the
philosophy and methods). In addition, the formative evaluation is not worth value if
those in control of the project disagree with the philosophy of it, if developers can not
agree on the goals of the program and the intended audience, and if there is no
chance for change.
C. Stages
Tessmer (1993) suggested four classically recognized types of formative evaluation:
1. Expert review: experts reviews the instruction with or without the evaluator
2. One-to-one: one learner at a time reviews the instructional with evaluator and
give comments upon it
3. Small-group: the evaluator tries out the instruction with a group of learners
and records their performances and comments
4. Field test: the evaluator observes the instruction being tried out in a realistic
situation with a group of learners.
In order to understand the concept of formative evaluation, Tessmer drew two figures
of the stages on formative evaluation. The figure below represents the conclusion of
both figures. Within few explanation about how to apply self-evaluation, Tessmer
suggested to conduct expert reviews and one-to-one together after self-evaluation,
revise the instruction, conduct a small group, revise the instruction, hold the field test
and revise and improve the instruction for the last time. One can use variation of
those types applied in the four steps such as expert panels, (team of experts and
evaluator), two-to-one (two learners review the instruction with the evaluator), and
rapid prototyping (immediate field-test evaluation).
Unfortunately, Tessmer did not mention anything about how to or whether we can
combine those types in each step of evaluation plan. Another problem that Tessmer
did not give explanation is that when we conduct expert reviews and one-to-one
together, what should be done to revise the prototype if there is no agreement
between the expert and one-to-one evaluation about any factors/aspects of the
prototype. In order to reduce this kind of confusion, this paper suggests to conduct
each step carefully, one step by one step. Nevertheless whenever there is an
opportunity to go back to previous step (for instance: time, money, and manpower),
one can do so. The book of Tessmer did not mention this step. He assumed that
whenever one can follow the steps precisely by considering the suggestions in the
book thoroughly than one will not have any difficulties or make unreasonable errors
in doing formative evaluation. This is the reason why he mentions that doing
formative evaluation depends on the thoroughness of the plan. Another possible way
of combining the steps of Tessmer is that to decide earlier what kind of information
one wants to gain and in what step or from whom the information should be gained.
Designing this type of information would be advantageous for combine the steps
together as long as there have the availability of the resources (time, money, and
manpower), the capability of the resources to be used effectively and efficiently, and
the accomplishment of the implementation of the steps.
Flagg (1990) explains the stages of formative evaluation by considering the
development of the program and the evaluation steps itself. The following is the
description of the formative evaluation for television, software, and videodisc.
Program Development Evaluation
Planning Needs Assessment
Design
Production Pre-production Formative Evaluation
Implementation Production Formative Evaluation
Implementation Formative Evaluation
By considering other experts, Flagg (1993) describes that the first evaluation
phase is need assessment or front-end analysis where to obtain the reason of the
program, content, and feasibility of the delivery system. The second phase (the pre-
production formative evaluation) conceptualize the planning phase to guide the pre-
production of the program called preliminary scripts. The third phase (production
formative evaluation) revise the early program versions with the target group. The
implementation formative evaluation phase operates the target learners in the
environment for which it was designed. These phases can be found and described
explicitly in chapter 4 – 7, except chapter 8 that is only for implementation formative
evaluation.
These two similar but different stages of formative evaluation describe each step in
detail but in different purpose. Tessmer explained each step in purpose of conducting
formative evaluation on instructional design in general but Flagg described it on the
purpose on evaluating the instructional technology format. In general these two
stages used similar approach, questions, attention, and measurement tools used in
conducting formative evaluation.
A. Plan
Tessmer suggested that the formative evaluation process can be done by the
following steps, i.e. determining the goals and the resources for and constraints upon
the evaluation, conducting task analysis, describing the learning environment,
determining the media characteristics, outlining the information sought, selecting
data gathering methods and tools, and implementing the evaluation. Tessmer
described each step by not only giving questions that should be considered in doing
formative evaluation, but also constructing some answers for them. For instance, the
answer for "what do you want to find out from the evaluation?" are learning
effectiveness, learner interest/motivation, content quality, technical quality, and
implementability.
In conducting formative evaluation in instructional technologies, Flagg gave four
criteria, i.e. usability (usable for decision making), practicality (gain answer within the
time limits & money available), importance (relevant to the objectives and situation),
and uncertainty (uncertain to those expectation to be answered). Methods to be used
in the formative evaluation can be hypothetico-deductive paradigm (top-down
approach or theory-based hypothesis) to confirm or explore causal relationship
between or among variables. Another method is inductive paradigm (bottom-up
approach) which begins with collection of qualitative and quantitative data directed
by the evaluation question.
Giving much attention on each step of formative evaluation (22 pages for expert
reviews, 29 for one-to-one, 35 for small group, and 16 for field test), Tessmer
described in detail how to conduct each step of formative evaluation. In expert
reviews is conducted to evaluate the clarity of objectives and content, practicality
(technical quality) of the prototype, and validity of the materials by using
connoisseurial or expert judgements reviews. Considering many aspects, Tessmer
stated that in expert reviews there are many types of experts and many types of
questions that should or should not be asked of each one, depending upon each
expert’s strengths and the goals of the evaluation. To be more thorough in doing
expert reviews, Tessmer gave an example at the end of each chapter.
The advantages of one-to-one are its interactive and highly productive, easy, quick
and inexpensive, its sources of revision information, the clearness of instruction and
direction, the completeness of the instruction, and the adequate quality of the
materials. Small group evaluation evaluates the effectiveness, appeal and
implementability of the approach. It also gives the study many advantages such as
inexpensive, easy to conduct, more accurate measures of teachers’ performance,
and more improvement in the instruction prototype. Field test evaluation is
conducted to describe the teacher acceptance, implementability, and organizational
acceptance of the prototype approach. It can be used to confirm the revisions made
in previous formative evaluations, to generate final revision suggestions, and to
investigate the effectiveness of the prototype instruction and also to gain the
polished version of the products and programs. Tessmer also provided an example
in each chapter of one-to-one, small group, and field test.
By considering other experts, Flagg (1993) describes that the first evaluation
phase is need assessment or front-end analysis to obtain the reason of the need of
the program, the content, and the feasibility of the delivery system. The gathering
data information entails reviews of existing studies, test and curricula, expert’s
reviews, and measurement of target audience characteristics. The second phase is
called the pre-production formative evaluation where the conceptualization of the
planning phase is guided the pre-production of the program called preliminary scripts
or writers’ notebook. In electronic learning project researchers include the target
audience and teachers in the process of making design decisions about content,
objectives, and production formats but the expert reviews (content and design) will
be used to guide the creativity of the designers and reduce uncertainty of some
critical decisions. The third phase is called production formative evaluation where
the program is revised considering the feedback from tryouts of early program
versions with the target group. Information of user-friendliness, comprehensibility,
appeal, and persuasiveness can give the production team confidence of success in
their revisions and decisions. The subject matter specialists, the designers, and
other experts can work together to improve the versions. The implementation
formative evaluation phase is concerned with how the program operates with
target learners in the environment for which it was designed. Field-testing helps
designers see how program managers will really use their final products with target
learners and the feedback aids the development of the support materials and the
future programs. This phase differs from the summative evaluation since the later
measures the learners who have not been yet exposed by the program.
Meanwhile Tessmer gave specific method used in doing formative evaluation along
with guidelines for collecting data and analyzing them in each step, Flagg described
many alternatives of measurements, such as self-report (respondent responds on
questionnaires of interviews), observation (renders an obtrusive and objective
record), tests, records (collection of data). Even though there was not elaboration of
the "how" question to conduct these measures, Flagg, by quoting Mielke (1973)
stated that superiority or inferiority of a research method cannot be established an
inherent quality, but it can be established in terms of performance in answering
questions.
In order to design sophisticated software, the user-friendliness (accessibility,
responsibility, flexibility, and memory), the reception (attention, appeal and
excitement), and the outcome effectiveness (motor skills, cognitive abilities, or
attitude that the learners have to learn). The methods such as observation and
mechanical recording devices will yield valuable information relevant to the
accessibility, responsiveness, and flexibility of computer-based educational programs
as well as self-report (think aloud, escorted trial, and diary) and tests. Giving much
elaboration only on visual attention, the book elaborates each of these self-report
methods especially in program evaluation analysis computer (PEAK system). The
book also did not explain why only self-report methods are suitable for measuring
these variables.
B. Miscellaneous
In general these two books gave different names but similar stages in doing
formative evaluation. Tessmer gave attention on the general instructional design and
Flagg focused on the instructional technologies. Tessmer described each stage in
detail and gave an example in each stage, but Flagg gave five examples and
described each stage in each example explicitly. The following table describes more
general difference between the books. The statistic of the two books:
Characteristics Flaggs’ Book Tessmers’ Book
Pages 259 159
Chapters 12 6
Examples 5 ex. in the chapter 4-8 1 ex. In the chapter 3-6
3 ex. in the chapter 2
ISBN 0-8058-01278 0-7494-08014
Price Fl. 30 ₤ 16.95
Publisher Lawrence Erlbaum Kogan Page
Year of issues 1990 1993
Index of subject Exists Exists
Index of authors Exists No
Glossary No Exists
Reference In each chapter At the end
Definition Implicitly Explicitly
Summary 3 in each chapters except 1 in each chapter
chapter 2, 4-8
History of Formative Exists Exists
Evaluation
C. Recommendation
The first weakness of Flagg’s book is that the structure of the content of the book
itself. Since there is no suggestion in using or how to read the book, one should read
the introduction (chapter 1) carefully in order to have big view of the content of the
book. Since the book is meant for students, practitioners, researchers, designers,
and developers in order to give them in-depth process of formative evaluation by
providing many examples, they should spent more time in the first chapter and go to
the summary of each chapter (chapter 5, 6 & 8 have no summary) then begin to read
the book in detail. The book is difficult for the beginners to understand but is easy to
read by the experienced person. The reason behind this could be because there was
no formative evaluation conducted for the book before the book is issued.
For whom who are beginners, it is very useful and understandable to read Tessmer’s
book because it is very structured. It is not because of only 6 chapters in the book
but it could be because the expert review and the field test have been conducted
before it is issued. Even without reading "A Note to my readers about this Book", one
can understand what, how, when, why, and why not to do formative evaluation in
instructional design. The book can be usefully applied not only on educational
instructional design but also in instructional technology. The book looks like a "cook
book" where one can read, use, plan, create, and implement the formative evaluation
by considering all the steps, strategies, questions, and suggestions included in the
book.
For conducting a developmental research, these books can be useful resources. The
books described in detail how to do, how to collect data, and how to analyze them for
improving instructional design or instructional technologies. Using the books as
guidelines for conducting formative evaluation is advantageous but the most
important thing is that the individual should have a good and strong attitude toward
conducting evaluation. It is not only because it will motivate person to do it but also
because it will help the person to understand, collect, compare, and analyze the data
virtuously and thoroughly in order to gain the best result of the phenomena in the
evaluation. Formative evaluation needs a fair, truth, and honest attitude of a person
in conducting it. Otherwise the result is only GIGO (Garbage In – Garbage Out).
References:
Flagg, Barbara N. 1990. Formative evaluation for educational technologies. Hillsdale,
New Jersey: Lawrence Erlbaum Associates, Publisher.
Krathwohl, David R. 1998. Methods of educational & social science research: An
integrated approach. New York: Addison-Wesley Longman, Inc.
Tessmer, Martin. 1993. Planning and conducting formative evaluation. London:
Kogan Page Limited.