Eval Guide
Eval Guide
A Self-Study Guide
August 2005
Suggested Citation:
U.S. Department of Health and Human Services. Centers for Disease Control and Prevention. Office of the
Director, Office of Strategy and Innovation. Introduction to program evaluation for public health programs: A
self-study guide. Atlanta, GA: Centers for Disease Control and Prevention, 2005.
Acknowledgments
This manual integrates, in part, the excellent work of the many CDC programs that have used CDC’s
Framework for Program Evaluation in Public Health to develop guidance documents and other
materials for their grantees and partners. We thank in particular the Office on Smoking and Health,
and the Division of Nutrition and Physical Activity, whose prior work influenced the content of this
manual.
We thank the following people from the Evaluation Manual Planning Group for their assistance in
coordinating, reviewing, and producing this document. In particular:
We extend special thanks to Daphna Gregg and Antoinette Buchanan for their careful editing and
composition work on drafts of the manual, and to the staff of the Office of the Associate Director of
Science for their careful review of the manual and assistance with the clearance process.
Contents
Page
Executive Summary
Introduction..................................................................................................................................... 1
Step 6: Ensure Use of Evaluation Findings and Share Lessons Learned .................................... 72
Glossary ........................................................................................................................................ 79
This document is a “how to” guide for planning and implementing evaluation activities. The manual is
based on CDC’s Framework for Program Evaluation in Public Health, and is intended to assist state,
local, and community managers and staff of public health programs in planning, designing,
implementing, and using the results of comprehensive evaluations in a practical way. The strategy
presented in this manual will help assure that evaluations meet the diverse needs of internal and external
stakeholders, including assessing and documenting program implementation, outcomes, efficiency, and
cost-effectiveness of activities, and taking action based on evaluation results to increase the impact of
programs.
Public health programs have as their ultimate goal preventing or controlling disease, injury, disability,
and death. Over time, this task has become more complex as programs themselves have become more
complex. Increasingly, public health programs address large problems, the solution to which must
engage large numbers of community members and organizations in a vast coalition. More often than not,
public health problems—which in the last century might have been solved with a vaccine or change in
sanitary systems—involve significant and difficult changes in attitudes and risk/protective behavior of
consumers and/or providers.
In addition, the context in which public health programs operate has become more complex. Programs
that work well in some settings fail dismally in others because of the fiscal, socioeconomic, demographic,
interpersonal, and interorganizational setting in which they are planted. At the same time that programs
have become more complex, the demands of policymakers and other stakeholders for accountability have
increased.
All these changes in the environment in which public health programs operate mean that strong program
evaluation is essential now more than ever, but also that there is no one “right” evaluation. Rather, a host
of evaluation questions may arise over the life of the program that might reasonably be asked at any point
in time. Addressing these questions about program effectiveness means paying attention to documenting
and measuring the implementation of the program and its success in achieving intended outcomes, and
using such information to be accountable to key stakeholders.
Program Implementation
The task of evaluation encourages us to examine the operations of a program, including which activities
take place, who conducts the activities, and who is reached as a result. In addition, evaluation will show
how faithfully the program adheres to implementation protocols. Through program evaluation, we can
determine whether activities are implemented as planned and identify program strengths, weaknesses,
and areas for improvement.
For example, a treatment program may be very effective for those who complete it, but the number of
participants may be low. Program evaluation may identify the location of the program or lack of
transportation as a barrier to attendance. Armed with this information, program managers can move the
class location or meeting times or provide free transportation, thus enhancing the chances the program
will actually produce its intended outcomes.
Program Accountability
Program evaluation is a tool with which to demonstrate accountability to the array of stakeholders,who
for a given program may include funding sources, policymakers, state, and local agencies implementing
the program, or community leaders. Depending on the needs of stakeholders, program evaluation
findings may demonstrate that the program makes a contribution to reducing morbidity and mortality or
relevant risk factors; or that money is being spent appropriately and effectively; or that further funding,
increased support, and policy change might lead to even more improved health outcomes. By holding
programs accountable in these ways, evaluation helps ensure that the most effective approaches are
maintained and that limited resources are spent efficiently.
This manual is based on CDC’s Framework for Program Evaluation in Public Health,1 and integrates
insights from Framework-based manuals developed by CDC’s Office on Smoking and Health,2 and
Division of Nutrition and Physical Activity3 for their grantees and state and local partners, and by the
Center for the Advancement of Community Based Public Health for community health programs.4 This
document is organized around the six steps of the CDC Framework:
• Engage Stakeholders
• Describe The Program
• Focus The Evaluation
• Gather Credible Evidence
• Justify Conclusions
• Ensure Use of Evaluation Findings and Share Lessons Learned
Each chapter illustrates the main points using examples inspired by real programs at the Federal, state,
and local levels. In addition, following each chapter are supplementary materials that apply the main
points of the chapter to your specific public health problem or area. These supplementary materials
include one or more crosscutting case examples relevant to the specific public health area.
1
Centers for Disease Control and Prevention. Framework for program evaluation in public health. Atlanta, GA:
MMWR 1999;48(NoRR-11):1-40.
2
US Department of Health and Human Services. Introduction to program evaluation for comprehensive tobacco control
programs. Atlanta, GA: US Department of Health and Human Services, Centers for Disease Control and Prevention,
Office on Smoking and Health, November 2001.
3
US Department of Health and Human Services. Physical activity evaluation handbook. Atlanta, GA: US Department of
Health and Human Services, Centers for Disease Control and Prevention, 2002.
4
Center for Advancement of Community Based Public Health. An evaluation framework for community health
programs. Durham, NC: Center for Advancement of Community Based Public Health, June 2000.
What makes true program evaluation different from the sort of informal assessment that any smart
and dedicated manager is doing all the time? Mainly, it’s that evaluation is conducted according to a
set of guidelines (protocols) that are systematic, consistent, and comprehensive to assure the
accuracy of the results. For purposes of this manual, we will define program evaluation as “the
systematic collection of information about the activities, characteristics, and outcomes of programs
to make judgments about the program, improve program effectiveness, and/or inform decisions
about future program development.”6 Program evaluation does not occur in a vacuum; rather, it is
influenced by real-world constraints. Evaluation should be practical and feasible and must be
conducted within the confines of resources, time, and political context. Moreover, it should serve a
useful purpose, be conducted in an ethical manner, and produce accurate findings. Evaluation
findings should be used both to make decisions about program implementation and to improve
program effectiveness.
As you will see, many different questions can be part of a program evaluation, depending on how
long the program has been in existence, who is asking the question, and why the information is
needed. In general, evaluation questions fall into one of these groups:
• Implementation: Were your program’s activities put into place as originally intended?
5
Scriven M. Minimalist theory of evaluation: The least theory that practice requires. American Journal of Evaluation
1998;19:57-70.
6
Patton MQ. Utilization-focused evaluation: The new century text. 3rd ed. Thousand Oaks, CA: Sage, 1997.
All of these are appropriate evaluation questions and might be asked with the intention of
documenting program progress, demonstrating accountability to funders and policymakers, or
identifying ways to make the program better.
Planning
Planning asks, “What are we doing and what should we do to achieve our goals?” Program
evaluation, by providing information on progress toward organizational goals and identifying which
parts of the program are working well and/or poorly, sets up the discussion of what can be changed
to help the program better meet its intended goals and objectives.
Performance Measurement
Increasingly, public health programs are called to be accountable to funders, legislators, and the
general public. Many programs do this by creating, monitoring, and reporting results for a small set
of markers and milestones of program progress. Such “performance measures” are a type of
evaluation—answering the question “How are we doing?” More importantly, when performance
measures show significant or sudden changes in program performance, program evaluation efforts
can be directed to the troubled areas to determine “Why are we doing poorly or well?”
Budgeting
Linking program performance to program budget is the final step in accountability. Called “activity-
based budgeting” or “performance budgeting,” it requires an understanding of program components
and the links between activities and intended outcomes. The early steps in the program evaluation
approach (such as logic modeling) clarify these relationships, making the link between budget and
performance easier and more apparent.
In the best of all worlds, surveillance and evaluation are companion processes that can be conducted
simultaneously. Evaluation may supplement surveillance data by providing tailored information to
answer specific questions about a program. Data collection that flows from the specific questions
that are the focus of the evaluation is more flexible than surveillance and may allow program areas
to be assessed in greater depth. For example, a state may supplement surveillance information with
detailed surveys to evaluate how well a program was implemented and the impact of a program on
participants’ knowledge, attitudes, and behavior. They can also use qualitative methods (e.g., focus
groups, feedback from program participants from semistructured or open-ended interviews) to gain
insight into the strengths and weaknesses of a particular program activity.
7
Green LW, George MA, Daniel M, Frankish CJ, Herbert CP, Bowie WR, et al. Study of participatory research in
health promotion: Review and recommendations for the development of participatory research in health promotion in
Canada. Ottawa, Canada: Royal Society of Canada, 1995.
Program staff may be pushed to do evaluation by external mandates from funders, authorizers, or
others, or they may be pulled to do evaluation by an internal need to determine how the program is
performing and what can be improved. While push or pull can motivate a program to conduct good
evaluations, program evaluation efforts are more likely to be sustained when staff see the results as
useful information that can help them do their jobs better.
Data gathered during evaluation enable managers and staff to create the best possible programs, to
learn from mistakes, to make modifications as needed, to monitor progress toward program goals,
and to judge the success of the program in achieving its short-term, intermediate, and long-term
outcomes. Most public health programs aim to change behavior in one or more target groups and to
create an environment that reinforces sustained adoption of these changes, with the intention that
changes in environments and behaviors will prevent and control diseases and injuries. Through
evaluation, you can track these changes and, with careful evaluation designs, assess the effectiveness
and impact of a particular program, intervention, or strategy in producing these changes.
Recognizing the importance of evaluation in public health practice and the need for appropriate
methods, the World Health Organization (WHO) established the Working Group on Health
Promotion Evaluation. The Working Group prepared a set of conclusions and related
recommendations to guide policymakers and practitioners.8 Recommendations immediately
relevant to the evaluation of comprehensive public health programs include:
• Encourage the adoption of participatory approaches to evaluation that provide meaningful
opportunities for involvement by all of those with a direct interest in initiatives (programs,
policies, and other organized activities).
• Require that a portion of total financial resources for a health promotion initiative be
allocated to evaluation—they recommend 10%.
• Ensure that a mixture of process and outcome information is used to evaluate all health
promotion initiatives.
• Support the use of multiple methods to evaluate health promotion initiatives.
• Support further research into the development of appropriate approaches to evaluating health
promotion initiatives.
8
WHO European Working Group on Health Promotion Evaluation. Health promotion evaluation: Recommendations to
policy-makers: Report of the WHO European working group on health promotion evaluation. Copenhagen, Denmark:
World Health Organization, Regional Office for Europe, 1998.
STEPS
The underlying logic of the Evaluation Framework is that
good evaluation does not merely gather accurate evidence and
Engage
Stakeholders draw valid conclusions, but produces results that are used to
make a difference. To maximize the chances evaluation
Ensure Use Describe
and Share
Lessons Learned
the Program results will be used, you need to create a “market” before you
STANDARDS
create the “product”—the evaluation. You determine the
Utility
Feasibility
Propriety
market by focusing your evaluations on questions that are
Justify
Conclusions
Accuracy
Focus the
Evaluation
most salient, relevant, and important. And you ensure the best
Design
evaluation focus by understanding where the questions fit into
Gather
the full landscape of your program description, and especially
Credible
Evidence by ensuring that you have identified and engaged stakeholders
who care about these questions and want to take action on the
Figure 1.1
Evaluation Framework
results.
The steps in the CDC Framework are informed by a set of standards for evaluation.12 These
standards do not constitute a way to do evaluation; rather, they serve to guide your choice from
among the many options available at each step in the Framework. The 30 standards cluster into four
groups:
• Utility: Who needs the evaluation results? Will the evaluation provide relevant information
in a timely manner for them?
• Feasibility: Are the planned evaluation activities realistic given the time, resources, and
expertise at hand?
9
Public Health Functions Steering Committee. Public health in America. Fall 1994. Available at
<http://www.health.gov/phfunctions/public.htm>. January 1, 2000.
10
Dyal WW. Ten organizational practices of public health: A historical perspective. American Journal of Preventive
Medicine 1995;11(6)Suppl 2:6-8.
11
Centers for Disease Control and Prevention. op cit.
12
Joint Committee on Standards for Educational Evaluation. The program evaluation standards: How to assess
evaluations of educational programs. 2nd ed. Thousand Oaks, CA: Sage Publications, 1994.
Sometimes the standards broaden your exploration of choices; as often, they help reduce the options
at each step to a manageable number. For example, in the step “Engaging Stakeholders,” the
standards can help you think broadly about who constitutes a stakeholder for your program, but
simultaneously can reduce the potential list to a manageable number by posing the following
questions based on the standards: (Utility) Who will use these results? (Feasibility) How much
time and effort can be devoted to stakeholder engagement? (Propriety) To be ethical, which
stakeholders need to be consulted, for example, those served by the program or the community in
which it operates? (Accuracy) How broadly do you need to engage stakeholders to paint an
accurate picture of this program?
Similarly, there are unlimited ways to “gather credible evidence.” Asking these same kinds of
questions as you approach evidence gathering will help identify ones that will be most useful,
feasible, proper, and accurate for this evaluation at this time. Thus, the CDC Framework approach
supports the fundamental insight that there is no such thing as the right program evaluation. Rather,
over the life of a program, any number of evaluations may be appropriate, depending on the
situation.
Although this staff person should have the skills necessary to competently coordinate evaluation
activities, he or she can choose to look elsewhere for technical expertise to design and implement
specific tasks. However, developing in-house evaluation expertise and capacity is a beneficial goal
for most public health organizations.
Of the characteristics of a good evaluator listed in the accompanying text box, the evaluator’s ability
to work with a diverse group of stakeholders warrants highlighting. The lead evaluator should be
willing and able to draw out and reconcile differences in values and standards of different
stakeholders and to work with knowledgeable stakeholder representatives in designing and
conducting the evaluation.
Additional evaluation expertise sometimes can be found in programs within the health department,
through external partners (e.g., universities, organizations, companies), from peer programs in other
states and localities, and through technical assistance offered by CDC.13
You can also use outside consultants as volunteers, advisory panel members, or contractors.
External consultants can provide high levels of evaluation expertise from an objective point of view.
Important factors to consider when selecting consultants are their level of professional training,
experience, and ability to meet your needs. Overall, it is important to find a consultant whose
approach to evaluation, background, and training best fit your program’s evaluation needs and goals.
Be sure to check all references carefully before you enter into a contract with any consultant.
To generate discussion around evaluation planning and implementation, several states have formed
evaluation advisory panels. Advisory panels typically generate input from local, regional, or
national experts otherwise difficult to access. Such an advisory panel will lend additional credibility
to your efforts and prove useful in cultivating widespread support for evaluation activities.
The evaluation team members should clearly define their respective roles. Informal consensus may
be enough; others prefer a written agreement that describes who will conduct the evaluation and
assigns specific roles and responsibilities to individual team members. Either way, the team must
clarify and reach consensus on the
• Purpose of the evaluation
• Potential users of the evaluation findings and plans for dissemination
• Evaluation approach
13
CDC’s Prevention Research Centers (PRC) program is an additional resource. The PRC program is a national network
of 24 academic research centers committed to prevention research and the ability to translate that research into programs
and policies. The centers work with state health departments and members of their communities to develop and evaluate
state and local interventions that address the leading causes of death and disability in the nation. Additional information
on the PRCs is available at www.cdc.gov/prc/index.htm.
The agreement should also include a timeline and a budget for the evaluation.
14
These cases are composites of multiple CDC and state and local efforts that have been simplified and modified to
better illustrate teaching points. While inspired by real CDC and community programs, they are not intended to reflect
the current operation of these programs.
The first step in the CDC Framework approach to program evaluation is to engage the stakeholders.
Stakeholders are people or organizations that are invested in the program, are interested in the results
of the evaluation, and/or have a stake in what will be done with the results of the evaluation.
Representing their needs and interests throughout the process is fundamental to good program
evaluation.
Clearly, these categories are not mutually exclusive; in particular, the primary users of evaluation
findings are often members of the other two groups, i.e., the program management or an advocacy
organization or coalition. While you may think you know your stakeholders well, these categories
help you to think broadly and inclusively in identifying stakeholders.
In reviewing the long list of stakeholders that might be generated in the three generic categories, use
of some or all of the evaluation standards will help identify those who matter most.
Use of results will be enhanced if you give priority to those stakeholders who
• Can increase the credibility of your efforts or your evaluation
• Are responsible for day-to-day implementation of the activities that are part of the program
• Will advocate for or authorize changes to the program that the evaluation may recommend
• Will fund or authorize the continuation or expansion of the program.
In addition, to be proper/ethical and accurate, you need to include those who participate in the
program and are affected by the program or its evaluation.
The worksheets at the end of this chapter are intended to help you identify key stakeholders. For
example, in using the worksheets with the Childhood Lead Poisoning Prevention (CLPP) program,
we identified the stakeholders in the sample worksheet 1A (see Table 1.1). Note that some
stakeholders appear in more than one column; these are not exclusive classes of stakeholders so
much as four ways of thinking about stakeholders to ensure we were thinking as broadly as possible.
Second, note that not all categories have the same number of stakeholders. Indeed, for a simple
project, there may be very few stakeholders and some categories may have none at all. The sample
worksheet 1B (see Table 1.2) helped us identify the perspectives and needs of these key stakeholders
and the implications for designing and implementing our evaluation. Note in the CLPP example that
while all stakeholders may applaud our efforts to reduce EBLL in children, several stakeholders put
priority on outcomes that might or might not agree with our priorities. For example, private
physicians are most interested in “yield” of their screening efforts, while Congress cares about cost-
effectiveness. Note that advocacy groups, in addition to specific outcomes that may be priorities for
them, also have some preferences related to data collection—expressing a preference for methods
other than surveys. All of these insights are helpful at the start of an evaluation to ensure that the
evaluation goes smoothly and the results are used.
Table 1.2
CLPP Example: What Matters to Stakeholders
In addition, it can be beneficial to engage your program’s critics in the evaluation. In some cases,
these critics can help identify issues around your program strategies and evaluation information that
could be attacked or discredited, thus helping you strengthen the evaluation process. This
information might also help you and others understand the opposition’s rationale and could help you
engage potential agents of change within the opposition. However, use caution: It is important to
understand the motives of the opposition before engaging them in any meaningful way.
This emphasis on engaging stakeholders mirrors the increasing prominence in the research
community of participatory models or “action” research. A participatory approach combines
systematic inquiry with the collaboration of diverse stakeholders to meet specific needs and to
contend with broad issues of equity and justice. As noted earlier, The Study of Participatory
Research in Health Promotion, commissioned by the Royal Society of Canada, has published a set
of guidelines for use by evaluators and funding agencies in assessing projects that aspire to be
participatory.15 The guidelines emphasize that traditional ways of conducting health research in
populations must adapt to meet the educational, capacity-building, and policy expectations of more
participatory approaches if the results of the research are to make a difference.
15
Green LW, George MA, Daniel M, Frankish CJ, Herbert CP, Bowie WR, et al. op cit.
Standard Questions
Utility • Who will use these results?
Feasibility • How much time and effort can be devoted to
stakeholder engagement?
Propriety • Which stakeholders need to be consulted to conduct
an ethical evaluation, for example, to ensure we will
identify negative as well as positive aspects of the
program?
Accuracy • How broadly do we need to engage stakeholders to
paint an accurate picture of this program?
Identify stakeholders, using the three broad categories discussed: those affected, those
involved in operations, and those who will use the evaluation results.
Review the initial list of stakeholders to identify key stakeholders needed to improve
credibility, implementation, advocacy, or funding/authorization decisions.
Create a plan for stakeholder involvement and identify areas for stakeholder input.
Target selected stakeholders for regular participation in key steps, including writing the
program description, suggesting evaluation questions, choosing evaluation questions, and
disseminating evaluation results.
Category Stakeholders
1 Who is affected by the program?
Increase credibility of our Implement the interventions that Advocate for changes to Fund/authorize the continuation
evaluation are central to this evaluation institutionalize the evaluation or expansion of the program
findings
Stakeholders What activities and/or outcomes of this program matter most to them?
Developing a comprehensive program description is the next step in the CDC Framework. A
comprehensive program description clarifies all the components and intended outcomes of the
program, thus helping you focus your evaluation on the most central and important questions. Note
that in this step you are describing the program and not the evaluation. In this chapter, you will use
a tool called “logic modeling” to depict these program components, but a program description can be
developed without using this or any tool.
This step can either follow the stakeholder step or precede it. In either case, the combination of
stakeholder engagement and program description produces clarity and consensus long before data
are available to measure program effectiveness. This clarity on activities, outcomes, and their inter-
relationships sets the stage for good program evaluation; in addition, they can be helpful in strategic
planning and performance measurement, ensuring that insights from these various processes are
integrated.
In addition to specifying these components, a complete program description includes discussion of:
• Stage of Development. Is the program just getting started, is it in the implementation stage,
or has it been underway for a significant period of time?
• Context. What factors and trends in the larger environment may influence program success
or failure?
You need not start from scratch in defining the components of your program description. For
example, a good source for generating a list of outcomes is the goals and objectives that may already
exist for the program in its mission, vision, or strategic plan (see text box). The specific objectives
outlined in documents like Healthy People 2010 are another starting point for defining some
components of the program description for public health efforts (see
http://www.health.gov/healthypeople).
For example, the problem addressed by the affordable housing program is compromised life
outcomes for low-income families due to lack of stability and quality of housing environments. The
problem need for the Childhood Lead Poisoning Prevention (CLPP) program is halting the
developmental slide that occurs in children with elevated blood-lead levels (EBLL).
Target Groups
Target groups are the various audiences that the program needs to move into action in order to make
progress on the public health problem. For the affordable housing program, action of some kind
needs to be taken by eligible families, volunteers, and funders/sponsors. For the CLPP program,
Outcomes
Outcomes16 are the changes in someone or something (other than the program and its staff) that you
hope will result from your program’s activities. For programs dealing with large and complex
public health problems, the ultimate outcome is often an ambitious and long-term one, such as
eliminating the problem or condition altogether or improving the quality of life of people already
affected. Hence, a strong program description usually provides details not only on the intended
long-term outcomes but on the short-term and intermediate outcomes that precede it and the
sequence in which they are likely to occur.
The text box “A Potential Hierarchy of Effects” outlines A Potential Hierarchy of Effects
a potential sequence for a program’s outcomes (effects).
Starting at the base of the hierarchy: Program activities 6.
6. Health
Health Outcomes
Outcomes
Health
Health indicators
indicators as
as end
end results
results
aim to obtain participation among targeted communities.
Participants’ reactions to program activities affect their
5.
5. System
System and
and Environment
Environment Change
Change
learning—their knowledge, opinions, skills, and Changes
Changes inin social,
social, economic,
economic, oror
aspirations. Through this learning process, people and environmental
environmental conditions
conditions as
as result
result of
of
recommendations,
recommendations, actions,
actions, policies
policies and
and
organizations take actions that result in a change in practices
practices implemented
implemented
social, behavioral, and/or environmental condition that
directs the long-term health outcomes of the community. 4.
4. Actions
Actions
Patterns
Patterns ofof behavior
behavior adopted
adopted
In thinking about this hierarchy or any sequence of by
by target
target audiences
audiences
outcomes, keep in mind that the higher order outcomes
are usually the “real” reasons the program was created, 3.
3. Learning
Learning
Knowledge,
Knowledge, opinions,
opinions, skills,
skills, and
and
even though the costs and difficulty of collecting aspirations
aspirations as
as end
end results
results
evidence increase as you move up the hierarchy.
Evaluations are strengthened by showing evidence at 2.
2. Reactions
Reactions
several levels of hierarchy; information from the lower Degree
Degree of
of interest;
interest; the
the feelings
feelings toward
toward the
the
levels helps to explain results at the upper levels, which program;
program; acceptance
acceptance of of activities,
activities, and
and of
of
educational
educational methods.
methods.
are longer term.
1.
1. Participation
Participation
The sequence of outcomes for the affordable housing Number
Number of of people
people reached;
reached; characteristics
characteristics
program is relatively simple: Families, sponsors, and of
of the
the people,
people, frequency
frequency and
and
intensity
intensity of
of contact.
contact.
volunteers must be engaged and work together for several
Source:
weeks to complete the house, then the sponsor must sell Excerpted and Adapted from Bennett and Rockwell, 1995.
the house to the family, and then the family must Targeting Outcomes of Programs
maintain the house payments. For the CLPP program, there are streams of outcomes for each of the
target groups: Providers must be willing to test, treat, and refer EBLL children. Housing officials
must be willing to clean up houses that have lead paint, and families must be willing to get children
and houses screened, adopt modest changes in housekeeping behavior, and adhere to any treatment
16
Program evaluation and planning are replete with terms that are used inconsistently. In this document, the term
“outcomes” is used to refer to the intended changes that will result from the program. However, others may use different
terms to refer to the early and late outcomes: results, impacts, and outcomes is a typical sequence.
Activities
These are the actual actions mounted by the program and its staff to achieve the desired outcomes in
the target groups. Obviously, activities will vary with the program. Some typical program activities
may include, among others, outreach, training, funding, service delivery, collaborations and
partnerships, and health communication. For example, the affordable housing program must recruit,
engage, and train the families, sponsors, and volunteers, and also oversee construction and handle
the mechanics of home sale. The CLPP program does outreach and screening of children, and, for
those children with EBLL, does case management, referral to medical care, assessment of the home,
and referral of lead-contaminated homes for cleanup.
Outputs
Outputs are the direct products of activities, usually some sort of tangible deliverable produced as a
result of the activities. Outputs can be viewed as activities redefined in tangible or countable terms.
For example, the affordable housing program’s activities of engaging volunteers, recruiting
sponsors, and selecting families have the corresponding outputs: number of volunteers engaged,
number of sponsors recruited and committed, and number and types of families selected. The CLPP
activities of screening, assessing houses, and referring children and houses would each have a
corresponding output: the number of children screened and referred, and the number of houses
assessed and referred.17
Resources/Inputs
These are the people, money, and information needed—usually from others outside the program—to
mount program activities effectively. It is important to include inputs in the program description
because accountability for resources to funders and stakeholders is often a focus of evaluation. Just
as important, the list of inputs is a reminder of the type and level of resources on which the program
is dependent. If, in fact, intended outcomes are not being achieved, the resources/inputs list reminds
you to look there for one reason that program activities could not be implemented as intended.
In the affordable housing program, for example, a supply of supervisory staff, community
relationships, land, and warehouse are all necessary inputs to activities. For the CLPP program,
funds, legal authority to screen children and houses, trained staff, and relationships with
organizations responsible for the activities that the program cannot undertake—in this case, medical
treatment and clean-up of homes—are necessary inputs to mount a successful CLPP program.
17
In trying to distinguish “outputs” from “outcomes,” remember that an outcome is a change in someone or something
other than the program and its staff. But also remember that these definitions are guidelines and are not set in stone.
Often, there are “gray areas” where something might be classified as an output by some programs and an outcome by
others. For example, the number of trainees attending my program is an outcome in the sense that someone other than
my program staff—the trainee—took an intentional action (attending the training), but many might classify this an
output—number of trainees attending—since there really has not been a change in the trainee.
For example, both the affordable housing and CLPP programs have been in existence for several
years and can be classed in the maintenance/outcomes achievement stage. Therefore, an evaluation
of these programs would probably focus on the degree to which outcomes have been achieved and
the factors facilitating or hindering the achievement of outcomes.
Context
The context is the larger environment in which the program is immersed. Because external factors
can present both opportunities and roadblocks, you should be aware of and understand them.
Program context includes politics, funding, interagency support, competing organizations,
competing interests, social and economic conditions, and history (of the program, agency, and past
collaborations).
For the affordable housing program, some contextual issues are the widespread beliefs in the power
of home ownership and in community-wide person-to-person contact as the best ways to transform
lives. At the same time, gentrification in low-income neighborhood drives real estate prices up,
which can make some areas unaffordable for the program. And some communities, while approving
of affordable housing in principle, may resist construction of these homes in their neighborhood.
For the CLPP program, some contextual issues include increasing demands on the time and attention
of primary health care providers, the concentration of EBLL children in low-income and minority
neighborhoods, and increasing demands on housing authorities to ameliorate environmental risks.
A realistic and responsive evaluation will be sensitive to a broad range of potential influences on the
program. An understanding of the context also lets users interpret findings accurately and assess the
findings’ generalizability. For example, the affordable housing program might be successful in a
small town, but may not work in an inner-city neighborhood without some adaptation.
Logic models are graphic depictions of the relationship between a program’s activities and its
intended outcomes. Two words in this definition bear emphasizing:
The logic model requires no new thinking about the program; rather, it converts the raw material
generated in the program description into a picture of the program. The remainder of this chapter
provides the steps in constructing and elaborating simple logic models. The next chapter, Focus the
Evaluation Design, shows how to use the model to identify and address issues of evaluation focus
and design.
Logic models may depict all or only some of the elements of program description (see text box),
depending on the use to which the model is being put. For example, Exhibit 2.1 is a simple, generic
logic model. If relevant to the intended use, the model could include references to the remaining
components of program description, such as “context” or “stage of development.” Likewise, some
of the examples presented below focus mainly on the connection of a program’s activities to its
sequence of outcomes. Adding “inputs” and explicit “outputs” to these examples would be a simple
matter if needed.
Exhibit 2.1
Basic Program Logic Model
Note that Worksheet 2A at the end of this chapter provides a simple format for doing this
categorization of activities and outcomes, no matter what method is used. Here, for the CLPP, we
completed the worksheet using the first method.
For example, if the list of activities includes a needs assessment, distribution of a survey, and
development of a survey, most would conclude that the needs assessment of content should occur
first, and that the distribution of a survey must be preceded by development of the survey. Likewise,
among the outcomes, most would generally concede that change in knowledge and attitudes would
precede change in behavior.
Worksheet 2B provides a simple format for expanding the initial two-column table. For the CLPP,
we expanded the initial two-column table to four columns. Note that no activities or outcomes have
been added. But the original lists have been spread over several columns to reflect the logical
sequencing. For the activities, we suggest that outreach, screening, and identification of EBLL
children need to occur in order to case manage, assess the houses, and refer the children and their
houses to follow-up. On the outcomes sides, we suggest that outcomes such as receipt of medical
treatment, clean-up of the house, and adoption of housekeeping changes must precede reduction in
EBLL and elimination of the resultant slide in development and quality of life.
Add any inputs and outputs. At this point, you may decide that the four-column logic model adds
all the clarity that is needed. If not, the next step is often to add columns for inputs and for outputs.
The inputs are inserted to the left of the activities while the outputs—as products of the activities—
are inserted to the right of the activities but before the outcomes.
For the CLPP, we can easily define and insert both inputs and outputs of our efforts. Note that the
outputs are the products of our activities, but do not confuse them with outcomes. No one has
changed yet; while we have identified a pool of leaded houses and referred a pool of EBLL children,
the houses have not been cleaned up, nor have the children been treated yet.
Draw arrows to depict intended causal relationships. The multi-column table of inputs,
activities, outputs, and outcomes that has been developed so far may contain enough detail,
depending on the purposes for which the model will be used. In fact, for conveying in a global way
the components of a program, it almost certainly will suffice. However, when the model is used to
set the stage for planning and evaluation discussions, the logic model will benefit from adding
arrows that show the causal relationships among activities and outcomes. These arrows may depict
a variety of relationships: from one activity to another, when the first activity exists mainly to feed
later activities; from an activity to an outcome, where the activity is intended to produce a change in
someone or something other than the program; from an early outcome to a later one, when the early
outcome is necessary to achieve the more distal outcome.
Examine the CLPP Logic Model (Exhibit 2.2) with causal arrows included. Note that no
activities/outputs or outcomes have been added. Instead, arrows were added to show the
relationships among activities and outcomes. Note also that streams of activities exist concurrently
to produce cleaned-up houses, medically “cured” children, and trained and active
households/families. It is the combination of these three streams that produces reductions in EBLL,
which is the platform for stopping the developmental slide and improving the quality of life.
Activities Outcomes
ID
ID Source
Source
Outreach
Outreach Do
Do and
and Lead
Environment
Environment Lead Source
Source
Refer
Refer for
for Removed
Assessment
Assessment Removed
Clean-up
Clean-up
Train Family
Family Performs
Performs Development
Development
Screening
Screening Train
Families In-home
In-home EBLLs
EBLLs are
are and
and
Families
Techniques
Techniques Reduced
Reduced Intelligence
Intelligence
Improve
Improve
ID Refer
Refer for Medical
Medical
ID Children
Children for
with Medical
Medical Management
Management
with
EBLL
EBLL Treatment
Treatment
More
More
Productive
Productive
and/or
and/or Quality
Quality
Lives
Lives
Case
Case
Management
Management
Clean up the logic model. Early versions are likely to be sloppy, and a nice, clean one that is
intelligible to others often takes several tries.
• Elaborating distal outcomes: Sometimes the simple model will end with the short-term
outcomes or even outputs. While this may reflect a program’s mission, usually the program
has been created to contribute to some larger purpose, and depicting this in the model leads
to more productive strategic planning discussions later. This elaboration is accomplished by
asking “so then what happens?” of the last outcome depicted in the simple model, and then
continuing to ask that of all subsequent outcomes until more distal ones are included.
For example, in Exhibit 2.3, the very simple logic model that might result from a review of
the narrative about the home ownership program is elaborated by asking, “So then what
happens?” Note that the original five-box model remains as the core of the elaborated
model, but the intended outcomes now include a stream of more distal outcomes for both the
new home-owning families and also for the communities in which houses are built. As will
be discussed later, the elaborated model can motivate the organization to think more
ambitiously about intended outcomes and whether the right activities are in place to produce
them.
Build
BuildHouse
House
Sell
SellHouse
House
Volunteers
Volunteers Sponsors
Sponsors Family
Family
Build
BuildHouse
House
Community Family
“Successful”
“Successful”
Stability
Stability of
of Home Self-Esteem
Home Ownership
Ownership Self-Esteem
Neighborhood
Neighborhood
Investment
Investment Family
Family Stability
Stability
Services
Services Personal
Personal
Job/Education
Job/Education
Outcomes
Outcomes
Economic
Economic Better
Better Quality
Quality of
of Life
Life for
for All
All
Development
Development
For example, the mission of many CDC programs can be displayed as a simple logic model
that shows key clusters of program activities and the key intended changes in a health
outcome(s) (Exhibit 2.4). The process of elaboration leads to the more detailed depiction of
how the same activities produce the major distal outcome, i.e., the milestones along the way.
Exhibit 2.4
Elaborating Intermediate Outcomes in Your Logic Models
Capacity
CapacityBuilding
Building Change
ChangePhysical
Physical
Surveillance
Surveillance Environments
Environments
Communication
Communication
Prevent
Preventand
and
Control
ControlProblem
Problem
Partnership
Partnership
Research
Researchand
and Change
ChangeSocial
Social
Development
Development Environments
Environments
Leadership
Leadership
Identify factors
SURVEILLANCE
and populations
Evidence-based
m odels.
Identify modifiable Strategies to Propose Adopt changes in
risk and protective implem ent m odels. policy policies, laws and
factors and Best changes regulations
RESEARCH & consequences. im plem entation
DEVELOPMENT Dev elop/test practices
interventions. Change physical
Create/identify best environment
m ethod and models
Network of strong
Support/develop Diffuse supply Prevent
frontline Adopt Change
frontline of tools, and
CAPACITY im plementers. practices established/
infrastructure. practices and control
BUILDING Good training tools and program s takes root
Identify skills and programs problem
and resources
needs
Effective
Identify channels, Change Generate
prevention
COMMUNICATION audiences, and key knowledge,
m essages and dem and for tools
beliefs attitudes and
information.
Effective behavior
delivery channels Change social
Identify strategic
PARTNERSHIP environment
partners
Access to leaders. Strong
Activated Access to
Forum for constituency partnerships
key groups at all levels
convening. Develop for prevention.
LEADERSHIP
research and other Shared vision
agendas Increased
resources
When programs need both global and specific logic models, it is helpful to develop a global model
first. The detailed models can be seen as more specific “magnification” of parts of the program. As
in geographic mapping programs such as Mapquest, the user can “zoom in” or “zoom out” on an
underlying map. The family of related models ensures that all players are operating from a common
frame of reference. Even when some staff members are dealing with a discrete part of the program,
they are cognizant of where their part fits into the larger picture.
The provider immunization program is a good example of “zooming in” on portions of a more global
model. The first logic model (Exhibit 2.5) is a global one depicting all the activities and outcomes,
but highlighting the sequence from training activities to intended outcomes of training. The second
logic model magnifies this stream only, indicating some more detail related to implementation of
training activities.
Activities Outcomes
Develop Distribute
Distribute Providers
Providers read
read
Develop
newsletter newsletter
newsletter newsletters
newsletters
newsletter
Provider
ProviderKAB
KAB
increases Providers
Providers
increases do
do more
more
Conduct Immunizations
Immunizations
Conduct
trainings Providers
trainings Providers
attend
attend
Outreach trainings
trainingsand Providers
Outreach and Providersknow
know Providers
rounds
rounds latest Providers
MD
MD peer
peer latest motivation Increased
Increased
rules
rulesand
and motivation
education
education and
and policies totodo coverage
coverage ofof
policies do target
rounds
rounds Immunization target pop
pop
Immunization
increases
increases
Develop
Develop Nurse
Nurse Educator
Educator LHD Reduce
Reduce VPD VPD
Tool
Tool Kit
Kit LHD nurses
nurses do
do
presentations
presentations private in
in target
target
private provider
provider Providers know
to
to LHDs
LHDs Providers know population
population
consults
consults registry
registryand
and
their
theirrole
roleininitit
Providers
Providers
receive
receive
and
and use
use Tool
Tool
Kits
Kits
Do
Do outreach
outreach
Provider
Provider KAB
KAB
increases
increases
Promote
Promote and
and
recruit
recruit
participants
participants
Providers
Providers know
know
Conduct Providers
Providers attend
attend latest
latest
Conduct Providers
Providers
Do trainings rules
rules and
and
Do logistics
logistics trainings
trainings trainings motivation
motivation
policies
policies
to
to immunize
immunize
increases
increases
Do
Do needs
needs
assessments
assessments Providers
Providers knowknow
registry
registry and
and
their
their role
role in
in itit
Develop
Develop
Tool
Tool Kit
Kit and
and
training
training
materials
materials
Standard Questions
Utility • Thinking about how the model will be used, is the level of detail appropriate
or is there too much or too little detail?
• Is the program description intelligible to those who need to use it to make
evaluation planning decisions?
Feasibility • Does the program description include at least some activities and outcomes
that are in control of the program?
Propriety • Is the evaluation complete and fair in assessing all aspects of the program,
including its strengths and weaknesses?
• Does the program description include enough detail to examine both
strengths and weaknesses, and unintended as well as intended outcomes?
Accuracy • Is the program description comprehensive?
• Have you documented the context of the program so that likely influences
on the program can be identified?
Convert inputs, activities, outputs, and outcomes into a simple global logic model.
Activities Outcomes
What will the program and its staff actually do? What changes do we hope will result in someone or something other than the
program and its staff?
Activities Outcomes
After completing Steps 1 and 2, you and your stakeholders should have a clear understanding of the
program and reached consensus. Now your evaluation team will need to focus the evaluation. This
includes determining the most important evaluation questions and the appropriate design for the
evaluation. Focusing the evaluation is based on the assumption that the entire program does not
need to be evaluated at any point in time. Rather, the “right” evaluation of the program depends on
what question is being asked, who is asking the question, and what will be done with the
information.
Since resources for evaluation are always limited, this chapter provides a series of decision criteria
to help you determine the best evaluation focus at any point in time. You will note that these criteria
are inspired by the evaluation standards: specifically, utility (who will use the results and what
information will be most useful to them) and feasibility (how much time and resources are available
for the evaluation).
The logic models developed in the prior step set the stage for determining the best evaluation focus.
The approach to evaluation focus in the CDC Evaluation Framework differs slightly from traditional
evaluation approaches. In the past, some programs tended to assume all evaluations were
“summative” ones, conducted when the program had run its course and intended to answer the
question, “Did the program work?” Consequently, a key question was, “Is the program ready for
evaluation?”
By contrast, the CDC Framework views evaluation as an ongoing activity over the life of a program
that asks, “Is the program working?” Hence, a program is always ready for some evaluation.
Because the logic model displays the program from inputs through activities/outputs through to the
sequence of outcomes from short-term to most distal, it can guide a discussion of what you can
expect to achieve at this point in the life of your project. Should you focus on distal outcomes, or
only on short- or mid-term ones? Or conversely, does a process evaluation make the most sense
right now?
Types of Evaluations
Many different questions can be part of a program evaluation, depending on how long the program
has been in existence, who is asking the question, and why the evaluation information is needed. In
general, evaluation questions for an existing program1 fall into one of the following groups:
1
There is another type of evaluation—“formative” evaluation—where the purpose of the evaluation is to gain insight into
the nature of the problem so that you can “formulate” a program or intervention to address it. While many steps of the
Framework will be helpful for formative evaluation, the emphasis in this manual is on instances wherein the details of the
program/intervention are already known even though it may not yet have been implemented.
• The locale where services or programs are provided (e.g., rural, urban)
• The number of people receiving services
• The economic status and racial/ethnic background of people receiving services
• The quality of services
• The actual events that occur while the services are delivered
• The amount of money the project is using
• The direct and in-kind funding for services
• The staffing for services or programs
• The number of activities and meetings
• The number of training sessions conducted
When evaluation resources are limited, only the most important issues of implementation fidelity can
be included. Here are some “usual suspects” that compromise implementation fidelity and should be
considered for inclusion in the process evaluation portion of the evaluation focus:
Our childhood lead poisoning logic model illustrates many of these potential process issues.
Reducing EBLL presumes the house will be cleaned, medical care referrals will be fulfilled, and
specialty medical care will be provided. All of these are transfers of accountability beyond the
program to the housing authority, the parent, and the provider, respectively. For provider training to
achieve its outcomes, it may presume completion of a three-session curriculum, which is a dosage
issue. Case management results in medical referrals, but it presumes adequate access to specialty
Effectiveness/Outcome
Outcome evaluations assess progress on the sequence of outcomes that the program is to address.
Programs often describe this sequence using terms like short-term, intermediate, and long-term
outcomes, or proximal (close to the intervention) or distal (distant from the intervention).
Depending on the stage of development of the program and the purpose of the evaluation, outcome
evaluations may include any or all of the outcomes in the sequence, including
While process and outcome evaluations are the most common, there are several other types of
evaluation questions that are central to a specific program evaluation. These include the following:
Efficiency: Are your program’s activities being produced with minimal use of resources such as
budget and staff time? What is the volume of outputs produced by the resources devoted to your
program?
Cost-Effectiveness: Does the value or benefit of your program’s outcomes exceed the cost of
producing them?
Attribution: Can the outcomes that are being produced be shown to be related to your program,
as opposed to other things that are going on at the same time?
All of these types of evaluation questions relate to some part, but not all, of the logic model.
Exhibits 3.1 and 3.2 show where in the logic model each type of evaluation would focus.
Implementation evaluations would focus on the inputs, activities, and outputs boxes and not be
concerned with performance on outcomes. Effectiveness evaluations would do the opposite—
focusing on some or all outcome boxes, but not necessarily on the activities that produced them.
Efficiency evaluations care about the arrows linking inputs to activities/outputs—how much output
is produced for a given level of inputs/resources. Attribution would focus on the arrows between
specific activities/outputs and specific outcomes—whether progress on the outcome is related to the
specific activity/output.
Short-term
Short-term Intermediate
Intermediate Long-term
Long-term
Inputs
Inputs Activities
Activities Outputs
Outputs Effects/
Effects/ Effects/
Effects/ Effects/
Effects/
Outcomes
Outcomes Outcomes
Outcomes Outcomes
Outcomes
Process/Implementation Outcome/Effectiveness
Exhibit 3.2
Evaluation Domains — Arrows
Short-term
Short-term Intermediate
Intermediate Long-term
Long-term
Inputs
Inputs Activities
Activities Outputs
Outputs Effects/
Effects/ Effects/
Effects/ Effects/
Effects/
Outcomes
Outcomes Outcomes
Outcomes Outcomes
Outcomes
Utility Considerations
Feasibility Considerations
The first four questions help identify the most useful focus of the evaluation, but you must also
determine whether it is a realistic/feasible one. Three questions provide a “reality check” on our
desired focus:
The affordable housing example shows how the desired focus might be constrained by “reality.”
The elaborated logic model was important in this case because it clarified that, while program staff
were focused on production of new houses, important stakeholders like community-based
organizations and faith-based donors were committed to more distal outcomes such as changes in
life outcomes of families or on the outcomes of outside investment in the community. The model
led to a discussion of reasonableness of expectations and, in the end, to expanded evaluation
indicators that included some of the more distal outcomes, but also to a greater appreciation by
stakeholders of the intermediate milestones on the way to their preferred outcomes.
• Scenario 1
At the 1-year mark, a neighboring community would like to adopt your program but
wonders, “What are we in for?” Here you might determine that questions of efficiency and
implementation are central to the evaluation. You would likely conclude this is a realistic
focus, given the stage of development and the intensity of the program. Questions about
outcomes would be premature.
• Scenario 2
At the 5-year mark, the auditing branch of your government funder wants to know, “Did
you spend our money well?” Clearly, this requires a much more comprehensive evaluation,
and would entail consideration of efficiency, effectiveness, possibly implementation, and
cost-effectiveness. It is not clear, without more discussion with the stakeholder, whether
research studies to determine causal attribution are also implied. Is this a realistic focus? At
year 5, probably yes. The program is a significant investment in resources and has been in
existence for enough time to expect some more distal outcomes to have occurred.
Note that in either scenario, you must also consider questions of interest to key stakeholders who are
not necessarily intended users of the results of the current evaluation. Here those were defined to be
advocates, who are concerned that families not be blamed for lead poisoning in their children, and
housing authority staff, who are concerned that amelioration include estimates of costs and
identification of less costly methods of lead reduction in homes. By year 5, these look like
reasonable questions to include in the evaluation focus. At year 1, stakeholders might need
assurance that you care about their questions, even if you cannot address them with this early
evaluation.
Three general types of research designs are commonly recognized: experimental, quasi-
experimental, and non-experimental/observational. Traditional program evaluation typically uses
the third type, but all three are presented here because, over the life of the program, traditional
evaluation approaches may need to be supplemented with other studies that look more like research.
Experimental designs use random assignment to compare the outcome of an intervention on one or
more groups with an equivalent group or groups that did not receive the intervention. For example,
a you could select a group of similar schools, and then randomly assign some schools to receive a
prevention curriculum and other schools to serve as controls. All schools have the same chance of
being selected as an intervention or control school. Because of the random assignment, you reduce
the chances that the control and intervention schools vary in any way that could influence
differences in program outcomes. This allows you to attribute change in outcomes to your program.
For example, if the students in the intervention schools delayed onset or risk behavior longer than
students in the control schools, you could attribute the success to your program.
However, in community settings it is hard, or sometimes even unethical, to have a true control
group. While there are some solutions that preserve the integrity of experimental design, another
option is to use a quasi-experimental design. These designs make comparisons between
nonequivalent groups and do not involve random assignment to intervention and control groups. An
example would be to assess adults’ beliefs about the harmful outcomes of environmental tobacco
smoke (ETS) in two communities, then conduct a media campaign in one of the communities. After
the campaign, you would reassess the adults and expect to find a higher percentage of adults
believing ETS is harmful in the community that received the media campaign. Critics could argue
that other differences between the two communities caused the changes in beliefs, so it is important
to document that the intervention and comparison groups are similar on key factors such as
population demographics and related current or historical events.
Related to quasi-experimental design, comparing outcomes/outcome data among states and between
one state and the nation as a whole are common and important ways to evaluate public health efforts.
Such comparisons will help you establish meaningful benchmarks for progress. States can also
compare their progress with that of states with a similar investment in their area of public health, or
they can contrast their outcomes with the results that could be expected if their programs were
similar to those of states with a larger investment.
Comparison data are also useful for measuring indicators in anticipation of new or expanding
programs. For example, noting a “lack of change” in key indicators over time prior to program
implementation helps demonstrate the need for your program and highlights the comparative
Observational designs are common in program evaluation. These include, but are not limited to,
time–series analysis, cross-sectional surveys, and case studies. Periodic cross-sectional surveys
(e.g.., the YTS or BRFSS) can inform your evaluation. Case studies may be particularly appropriate
for assessing changes in public health capacity in disparate population groups. Case studies are
often applicable when the program is unique, when an existing program is used in a different setting,
when a unique outcome is being assessed, or when an environment is especially unpredictable. Case
studies can also allow for an exploration of community characteristics and how these may influence
program implementation, as well as identifying barriers to and facilitators of change.
This issue of “causal attribution,” while often a central research question, may or may not need to
supplement traditional program evaluation. The field of public health is under increasing pressure to
demonstrate that programs are worthwhile, effective, and efficient. During the last two decades,
knowledge and understanding about how to evaluate complex programs have increased significantly.
Nevertheless, because programs are so complex, these traditional research designs described here
may not be a good choice. As the World Health Organization notes, “the use of randomized control
trials to evaluate health promotion initiatives is, in most cases, inappropriate, misleading, and
unnecessarily expensive.”2
The design you select influences the timing of data collection, how you analyze the data, and the
types of conclusions you can make from your findings. A collaborative approach to focusing the
evaluation provides a practical way to better ensure the appropriateness and utility of your
evaluation design.
Standard Questions
Utility • What is the purpose of the evaluation?
• Who will use the evaluation results and how will they
use them?
• What are special needs of any other stakeholders
that must be addressed?
Feasibility • What is the program’s stage of development?
• How intense is the program?
• How measurable are the components in the
proposed focus?
Propriety • Will the focus and design adequately detect any
unintended consequences?
• Will the focus and design include examination of the
experience of those who are affected by the
program?
Accuracy • Is the focus broad enough to detect success or
failure of the program?
• Is the design the right one to respond to the
questions—such as attribution—that are being asked
by stakeholders?
Determine the components of your logic model that should be part of the focus given
these “utility and “feasibility” considerations.
Review evaluation questions with stakeholders, program managers, and program staff.
Review options for the evaluation design, making sure that the design fits the
evaluation questions.
2 Who will use the evaluation results and for what purpose?
Now that you have developed a logic model, chosen an evaluation focus, and selected your
evaluation questions, your next task is to gather the evidence. The gathering of evidence for an
evaluation resembles the gathering of evidence for any research or data-oriented project, with a few
exceptions noted below.
• Indicators
• Sources of evidence/methods of data collection
• Quality
• Quantity
• Logistics
Developing Indicators
Because the components of our programs are often expressed in global or abstract terms, indicators
are specific, observable, and measurable statements that help define exactly what we mean or are
looking for. For example, the CLPP model includes global statements such as “Children receive
medical treatment” or “Families adopt in-home techniques.” The medical treatment indicator might
specify the type of medical treatment, the duration, or perhaps the adherence to the regimen.
Likewise, the family indicator might indicate the in-home techniques or the intensity or duration of
their adoption. For example, “Families with EBLL children clean all window sills and floors with
the designated cleaning solution each week” or “Families serve leafy green vegetables at three or
more meals per week.” Outcome indicators such as these indicators provide clearer definitions of
the global statement and help guide the selection of data collection methods and the content of data
collection instruments.
The activities in your focus may also include global statements such as “good coalition,” “culturally
competent training,” and “appropriate quality patient care.” These activities would benefit from
elaboration into indicators, often called “process indicators.” What does “good” mean, what does
“quality” or “appropriate” mean?
3
Note that if you are developing your evaluation after completing an evaluation plan, you may already have developed
process or outcome objectives. If the objectives were written to be specific, measurable, action-oriented, realistic, and
time-bound (so-called “SMART” objectives), then they may serve as indicators as well.
Consider CDC’s immunization program, for example. The table below lists the components of the
logic model that were included in our focus in Step 3. Then each of these components has been
defined in one or more indicators.
Table 4.1
Provider Immunization Program:
Indicators for Program Component in Our Evaluation Focus
You may need to develop your own indicators or you may be able to draw on existing indicators
developed by others. Some large CDC programs have developed indicator inventories that are tied
to major activities and outcomes for the program. Advantages of these indicator inventories:
A key decision is whether there are existing data sources—secondary data collection—to measure
your indicators or whether you need to collect new data—primary data collection.
Depending on your evaluation questions and indicators, some secondary data sources may be
appropriate data collection sources. Some existing data sources that often come into play in
measuring outcomes of public health programs:
• Current Population Survey and other U.S. Census files
• Behavioral Risk Factor Surveillance System (BRFSS)
• Youth Risk Behavior Survey (YRBS)
• Pregnancy Risk Assessment Monitoring System (PRAMS)
• Cancer registries
• State vital statistics
• Various surveillance databases
• National Health Interview Survey (NHIS)
Before using secondary data sources, ensure that they meet your needs. Although large ongoing
surveillance systems have the advantages of collecting data routinely and having existing resources
and infrastructure, some of them (e.g., Current Population Survey [CPS]) have little flexibility with
regard to the questions asked in the survey, making it nearly impossible to use these systems to
collect the special data you may need for your evaluation. By contrast, other surveys such as BRFSS
or PRAMS are more flexible. For example, you might be able to add program-specific questions, or
you might expand the sample size for certain geographic areas or target populations, allowing for
more accurate estimates in smaller populations.
The most common primary data collection methods also fall into several broad categories. Among
the most common are:
• Surveys, including personal interviews, telephone, or instruments completed in person or
received through the mail or e-mail
• Group discussions/focus groups
• Observation
• Document review, such as medical records, but also diaries, logs, minutes of meetings, etc.
Choosing the “right” method from the many secondary and primary data collection choices must
consider both the context in which it is asked (How much money can be devoted to collection and
measurement? How soon are results needed? Are there ethical considerations?) and the content of
the question (Is it a sensitive issue? Is it about a behavior that is observable? Is it something the
respondent is likely to know?).
Each method comes with advantages and disadvantages depending on the context and content of the
data collection (see Table 4.2).
Table 4.2
Advantages and Disadvantages of Various Survey Methods
The text box below lists possible sources of information for evaluations clustered in three broad
categories: people, observations, and documents.
When choosing data collection methods and sources, select those that meet your project’s needs.
Try to avoid choosing a data method/source that may be familiar or popular but does not necessarily
answer your questions. Keep in mind that budget issues alone should not drive your evaluation
planning efforts.
The four evaluation standards can help you reduce the enormous number of data collection options
to a more manageable number that best meet your data collection situation. Here is a checklist of
issues — based on the evaluation standards — that will help you choose appropriately:
Utility
• Purpose and use of data collection: Do you seek a “point in time” determination of a
behavior, or to examine the range and variety or experiences, or to tell an in-depth story?
• Users of data collection: Will some methods make the data more credible with skeptics or
key users than others?
Propriety
• Characteristics of the respondents: Will issues such as literacy or language make some
methods preferable to others?
• Degree of intrusion to program/participants: Will the data collection method disrupt the
program or be seen as intrusive by participants?
• Other ethical issues: Are there issues of confidentiality or safety of the respondent in seeking
answers to questions on this issue?
Accuracy
• Nature of the issue: Is it about a behavior that is observable?
• Sensitivity of the issue: How open and honest will respondents be in responding to the
questions on this issue?
• Respondent knowledge: Is it something the respondent is likely to know?
Different methods reveal different aspects of the program. Consider some interventions related to
tobacco control:
• You might include a group assessment of a school-based tobacco control program to hear the
group’s viewpoint, as well as individual student interviews to get a range of opinions.
• You might conduct a survey of all legislators in a state to gauge their interest in managed
care support of cessation services and products, and you might also interview certain
legislators individually to question them in greater detail.
When the outcomes under investigation are very abstract or no one quality data source exists,
combining methods maximizes the strengths and minimizes the limitations of each method. Using
multiple or mixed methods can increase the cross-checks on different subsets of findings and
generate increased stakeholder confidence in the overall findings.
Table 4.3
Provider Immunization Education Program:
Data Collection Methods and Sources for Indicators
Quality of Data
A quality evaluation produces data that are reliable, valid, and informative. An evaluation is reliable
to the extent that it repeatedly produces the same results, and it is valid if it measures what it is
intended to measure. The advantage of using existing data sources such as the BRFSS, YRBS, or
PRAMS is that they have been pretested and designed to produce valid and reliable data. If you are
designing your own evaluation tools, you should be aware of the factors that influence data quality:
• The design of the data collection instrument and how questions are worded
• The data collection procedures
• Training of data collectors
• The selection of data sources
• How the data are coded
• Data management
• Routine error checking as part of data quality control
Quantity of Data
You will also need to determine the amount of data you want to collect during the evaluation. There
are cases where you will need data of the highest validity and reliability, especially when traditional
program evaluation is being supplemented with research studies. But there are other instances where
the insights from a few cases or a convenience sample may be appropriate. If you use secondary
data sources, many issues related to quality of data—such as sample size—have already been
determined. If you are designing your own data collection tool and your examination of your
program includes research as well as evaluation questions, the quantity of data you need to collect
(i.e., sample sizes) will vary with the level of detail and the types of comparisons you hope to make.
You will also need to determine the jurisdictional level for which you are gathering the data (e.g.,
state, county, region, congressional district). Counties often appreciate and want county-level
estimates; however, this usually means larger sample sizes and more expense. Finally, consider the
size of the change you are trying to detect. In general, detecting small amounts of change requires
larger sample sizes. For example, detecting a 5% increase would require a larger sample size than
detecting a 10% increase. You may need the help of a statistician to determine adequate sample
size.
In outlining procedures for collecting the evaluation data, consider these issues:
• When will you collect the data? You will need to determine when (and at what intervals) it
is most appropriate to collect the information. If you are measuring whether your objectives
have been met, your objectives will provide guidance as to when to collect certain data. If
you are evaluating specific program interventions, you might want to obtain information
from participants before they begin the program, upon completion of the program, and
several months after the program. If you are assessing the effects of a community campaign,
you might want to assess community knowledge, attitudes, and behaviors among your target
audience before and after the campaign.
• Who will be considered a participant in the evaluation? Are you targeting a relatively
specific group (African-American young people), or are you assessing trends among a more
You may already have answered some of these questions while selecting your data sources and
methods.
Standard Questions
Utility • Have key stakeholders been consulted who can assist with
access to respondents?
• Are methods and sources appropriate to the intended purpose
and use of the data?
• Have key stakeholders been consulted to ensure there are no
preferences for or obstacles to selected methods or sources?
• Are there specific methods or sources that will enhance the
credibility of the data with key user and stakeholders?
Feasibility • Can the data methods and sources be implemented within the
time and budget for the project?
• Does the evaluation team have the expertise to implement the
chosen methods?
• Are the methods and sources consistent with the culture and
characteristics of the respondents, such as language and literacy
level?
• Are logistics and protocols realistic given the time and resources
that can be devoted to data collection?
Propriety • Will data collection be unduly disruptive?
• Are there issues of safety of respondents or confidentiality that
must be addressed?
• Are the methods and sources appropriate to the culture and
characteristics of the respondents—will they understand what
they are being asked?
Accuracy • Are appropriate QA procedures in place to ensure quality of
data collection?
• Are enough data being collected,—i.e., to support chosen
confidence levels or statistical power?
• Are methods and sources consistent with the nature of the
problem, the sensitivity of the issue, and the knowledge level of
the respondents?
Determine whether existing indicators will suffice or whether new ones must be
developed.
Consider the range of data sources and choose the most appropriate one.
Consider the range of data collection methods and choose those best suited to your
context and content.
10
10
Whether your evaluation is conducted to show program effectiveness, help improve the program, or
demonstrate accountability, you will need to analyze and interpret the evidence gathered in Step 4.
Step 5 encompasses analyzing the evidence, making claims about the program based on the analysis,
and justifying the claims by comparing the evidence against stakeholder values.
Analyze
Analyzeand
and
Synthesize
Synthesize
Findings
Findings
Interpret Make
MakeJudgments
InterpretFindings
Findings Judgments
Identify Program
Identify Program
Standards
Standards
The complicating factor, of course, is that different stakeholders may bring different and even
contradictory standards and values to the table. As the old adage states, “where you stand depends
on where you sit.” Fortunately for those using the CDC Framework, the work of Step 5 benefits
from the efforts of the previous steps: Differences in values and standards will have been identified
at the during stakeholder engagement in Step 1. Those stakeholder perspectives will also have been
reflected in the program description and evaluation focus.
• Enter the data into a database and check for errors. If you are using a surveillance system
such as BRFSS or PRAMS, the data have already been checked, entered, and tabulated by
In evaluations that use multiple methods, patterns in evidence are detected by isolating important
findings (analysis) and combining different sources of information to reach a larger understanding
(synthesis).
• Needs of participants
• Community values, expectations, and norms
• Program mission and objectives
• Program protocols and procedures
• Performance by similar programs
• Performance by a control or comparison group
• Resource efficiency
• Mandates, policies, regulations, and laws
• Judgments of participants, experts, and funders
• Institutional goals
• Social equity
• Human rights.
Conflicting claims about a program’s quality, value, or importance often indicate that stakeholders
are using different program standards or values in making their judgments. This type of
disagreement can prompt stakeholders to clarify their values and reach consensus on how the
program should be judged.
Source: US Department of Health and Human Services. Introduction to program evaluation for comprehensive
tobacco control programs. Atlanta, GA: US Department of Health and Human Services, Centers for Disease
Control and Prevention, Office on Smoking and Health, November 2001.
Standard Questions
Utility Have you carefully described the perspectives, procedures, and rationale used to
interpret the findings?
Have stakeholders considered different approaches for interpreting the findings?
Feasibility Is the approach to analysis and interpretation appropriate to the level of expertise
and resources?
Propriety Have the standards and values of those less powerful or those most affected by
the program been taken into account in determining standards for success?
Accuracy Can you explicitly justify your conclusions?
Are the conclusions fully understandable to stakeholders?
If multiple methods have been employed, compare different methods for consistency in
findings.
Use existing standards (e.g., Healthy People 2010 objectives) as a starting point for
comparisons.
Question Response
The ultimate purpose of program evaluation is to use the information to improve programs. The
purpose(s) you identified early in the evaluation process should guide the use of the evaluation
results. The evaluation results can be used to demonstrate the effectiveness of your program,
identify ways to improve your program, modify program planning, demonstrate accountability, and
justify funding.
• To demonstrate to legislators or other stakeholders that resources are being well spent and
that the program is effective.
• To aid in forming budgets and justify the allocation of resources.
• To compare outcomes with those of previous years.
• To compare actual outcomes with intended outcomes.
• To suggest realistic intended outcomes.
• To support annual and long-range planning.
• To focus attention on issues important to your program.
• To promote your program.
• To identify partners for collaborations.
• To enhance the image of your program.
• To retain or increase funding.
• To provide direction for program staff.
• To identify training and technical assistance needs.
What’s involved in ensuring use and sharing lessons learned? Five elements are important in
making sure that the findings from an evaluation are used:
• Recommendations
• Preparation
• Feedback
• Follow-up
• Dissemination
Making Recommendations
Recommendations are actions to consider as a result of an evaluation. Recommendations can
strengthen an evaluation when they anticipate and react to what users want to know, and may
undermine an evaluation’s credibility if they are not supported by enough evidence, or are not in
keeping with stakeholders’ values.
Audience: Legislators.
Purpose of Evaluation: Demonstrate effectiveness.
Recommendation: Last year, a targeted education and media campaign about the need for private
provider participation in adult immunization was conducted across the state. Eighty percent of
providers were reached by the campaign and reported a change in attitudes towards adult
immunization—a twofold increase from the year before. We recommend the campaign be continued
and expanded to include an emphasis on minimizing missed opportunities of providers to conduct
adult immunizations.
Preparation
Preparation refers to the steps taken to get ready to eventually use the evaluation findings. Through
preparation, stakeholders can:
Feedback
Feedback is the communication that occurs among everyone involved in the evaluation. Feedback,
necessary at all stages of the evaluation process, creates an atmosphere of trust among stakeholders.
Early in an evaluation, the process of giving and receiving feedback keeps an evaluation on track by
keeping everyone informed about how the program is being implemented and how the evaluation is
proceeding. As the evaluation progresses and preliminary results become available, feedback helps
ensure that primary intended users and other stakeholders have opportunities to comment on
evaluation decisions. Valuable feedback can be obtained by holding discussions and routinely
sharing interim findings, provisional interpretations, and draft reports.
Follow-up
Although follow-up refers to the support that many users need throughout the evaluation process, in
this step, in particular, it refers to the support that is needed after users receive evaluation results and
begin to reach and justify their conclusions. Active follow-up can achieve the following:
Table 6.1
Standards for Step 6:
Ensure Use and Share Lessons Learned
Standard Questions
Utility • Do reports clearly describe the program, including its context, and the evaluation’s
purposes, procedures, and findings?
• Have you shared significant mid-course findings and reports with users so that the
findings can be used in a timely fashion?
• Have you planned, conducted, and reported the evaluation in ways that encourage
follow-through by stakeholders?
Feasibility • Is the format appropriate to your resources and to the time and resources of the
audience?
Propriety • Have you ensured that the evaluation findings (including the limitations) are made
accessible to everyone affected by the evaluation and others who have the right to
receive the results?
Accuracy • Have you tried to avoid the distortions that can be caused by personal feelings and other
biases?
• Do evaluation reports impartially and fairly reflect evaluation findings?
Evaluation is a practical tool that states can use to inform programs’ efforts and assess their impact.
Program evaluation should be well integrated into the day-to-day planning, implementation, and
management of public health programs. Program evaluation complements CDC’s operating
principles for public health, which include using science as a basis for decision-making and action,
expanding the quest for social equity, performing effectively as a service agency, and making efforts
outcome-oriented. These principles highlight the need for programs to develop clear plans, inclusive
partnerships, and feedback systems that support ongoing improvement. CDC is committed to
providing additional tools and technical assistance to states and partners to build and enhance their
capacity for evaluation.
Identify strategies to increase the likelihood that evaluation findings will be used.
Worksheet 6B
Ensuring Follow-up
Accuracy: The extent to which an evaluation is truthful or valid in what it says about a program,
project, or material.
Activities: The actual events or actions that take place as a part of the program.
Attribution: The estimation of the extent to which any results observed are caused by a program,
meaning that the program has produced incremental effects.
Case study: A data collection method that involves in-depth studies of specific cases or projects
within a program. The method itself is made up of one or more data collection methods (such as
interviews and file review).
Causal inference: The logical process used to draw conclusions from evidence concerning what
has been produced or “caused” by a program. To say that a program produced or caused a certain
result means that, if the program had not been there (or if it had been there in a different form or
degree), then the observed result (or level of result) would not have occurred.
Comparison group: A group not exposed to a program or treatment. Also referred to as a control
group.
Comprehensiveness: Full breadth and depth of coverage on the evaluation issues of interest.
Conclusion validity: The ability to generalize the conclusions about an existing program to other
places, times, or situations. Both internal and external validity issues must be addressed if such
conclusions are to be reached.
Confidence level: A statement that the true value of a parameter for a population lies within a
specified range of values with a certain level of probability.
Control group: In quasi-experimental designs, a group of subjects who receive all influences
except the program in exactly the same fashion as the treatment group (the latter called, in some
circumstances, the experimental or program group). Also referred to as a non-program group.
Cost-benefit analysis: An analysis that combines the benefits of a program with the costs of the
program. The benefits and costs are transformed into monetary terms.
Cross-sectional data: Data collected at one point in time from various entities.
Data collection method: The way facts about a program and its outcomes are amassed. Data
collection methods often used in program evaluations include literature search, file review, natural
observations, surveys, expert opinion, and case studies.
Descriptive statistical analysis: Numbers and tabulations used to summarize and present
quantitative information concisely.
Diffusion or imitation of treatment: Respondents in one group get the effect intended for the
treatment (program) group. This is a threat to internal validity.
Direct analytic methods: Methods used to process data to provide evidence on the direct impacts
or outcomes of a program.
Evaluation design: The logical model or conceptual framework used to arrive at conclusions about
outcomes.
Evaluation plan: A written document describing the overall approach or design that will be used to
guide an evaluation. It includes what will be done, how it will be done, who will do it, when it will
be done, why the evaluation is being conducted, and how the findings will likely be used.
Evaluation strategy: The method used to gather evidence about one or more outcomes of a
program. An evaluation strategy is made up of an evaluation design, a data collection method, and
an analysis technique.
Expert opinion: A data collection method that involves using the perceptions and knowledge of
experts in functional areas as indicators of program outcome.
External validity: The ability to generalize conclusions about a program to future or different
conditions. Threats to external validity include selection and program interaction, setting and
program interaction, and history and program interaction.
File review: A data collection method involving a review of program files. There are usually two
types of program files: general program files and files on individual projects, clients, or participants.
Focus group: A group of people selected for their relevance to an evaluation that is engaged by a
trained facilitator in a series of discussions designed for sharing insights, ideas, and observations on
a topic of concern.
History: Events outside the program that affect the responses of those involved in the program.
History and program interaction: The conditions under which the program took place are not
representative of future conditions. This is a threat to external validity.
Ideal evaluation design: The conceptual comparison of two or more situations that are identical
except that in one case the program is operational. Only one group (the treatment group) receives
the program; the other groups (the control groups) are subject to all pertinent influences except for
the operation of the program, in exactly the same fashion as the treatment group. Outcomes are
measured in exactly the same way for both groups and any differences can be attributed to the
program.
Implicit design: A design with no formal control group and where measurement is made after
exposure to the program.
Indicator: A specific, observable, and measurable characteristic or change that shows the progress
a program is making toward achieving a specified outcome.
Inferential statistical analysis: Statistical analysis using models to confirm relationships among
variables of interest or to generalize findings to an overall population.
Informal conversational interview: An interviewing technique that relies on the natural flow of a
conversation to generate spontaneous questions, often as part of an ongoing observation of the
activities of a program.
Instrumentation: The effect of changing measuring instruments from one measurement to another,
as when different interviewers are used. This is a threat to internal validity.
Interaction effect: The joint net effect of two (or more) variables affecting the outcome of a quasi-
experiment.
Internal validity: The ability to assert that a program has caused measured results (to a certain
degree), in the face of plausible potential alternative explanations. The most common threats to
internal validity are history, maturation, mortality, selection bias, regression artifacts, diffusion, and
imitation of treatment and testing.
Interviewer bias: The influence of the interviewer on the interviewee. This may result from
several factors, including the physical and psychological characteristics of the interviewer, which
may affect the interviewees and cause differential responses among them.
List sampling: Usually in reference to telephone interviewing, a technique used to select a sample.
The interviewer starts with a sampling frame containing telephone numbers, selects a unit from the
frame, and conducts an interview over the telephone either with a specific person at the number or
with anyone at the number.
Literature search: A data collection method that involves an identification and examination of
research reports, published papers, and books.
Logic model: A systematic and visual way to present the perceived relationships among the
resources you have to operate the program, the activities you plan to do, and the changes or results
you hope to achieve.
Longitudinal data: Data collected over a period of time, sometimes involving a stream of data for
particular persons or entities over time.
Macro-economic model: A model of the interactions between the goods, labor, and assets markets
of an economy. The model is concerned with the level of outputs and prices based on the
interactions between aggregate demand and supply.
Matching: Dividing the population into “blocks” in terms of one or more variables (other than the
program) that are expected to have an influence on the impact of the program.
Maturation: Changes in the outcomes that are a consequence of time rather than of the program,
such as participant aging. This is a threat to internal validity.
Measuring devices or instruments: Devices that are used to collect data (such as questionnaires,
interview guidelines, and observation record forms).
Micro-economic model: A model of the economic behavior of individual buyers and sellers, in a
specific market and set of circumstances.
Monetary policy: Government action that influences the money supply and interest rates. May
also take the form of a program.
Mortality: Treatment (or control) group participants dropping out of the program. It can
undermine the comparability of the treatment and control groups and is a threat to internal validity.
Multiple lines of evidence: The use of several independent evaluation strategies to address the
same evaluation issue, relying on different data sources, on different analytical methods, or on both.
Natural observation: A data collection method that involves on-site visits to locations where a
program is operating. It directly assesses the setting of a program, its activities, and individuals who
participate in the activities.
Non-probability sampling: When the units of a sample are chosen so that each unit in the
population does not have a calculable non-zero probability of being selected in the sample.
Non-response bias: Potential skewing because of non-response. The answers from sampling units
that do produce information may differ on items of interest from the answers from the sampling units
that do not reply.
Non-sampling error: The errors, other than those attributable to sampling, that arise during the
course of almost all survey activities (even a complete census), such as respondents’ different
interpretation of questions, mistakes in processing results, or errors in the sampling frame.
Objective data: Observations that do not involve personal feelings and are based on observable
facts. Objective data can be measured quantitatively or qualitatively.
Objectivity: Evidence and conclusions that can be verified by someone other than the original
authors.
Order bias: A skewing of results caused by the order in which questions are placed in a survey.
Outcome effectiveness issues: A class of evaluation issues concerned with the achievement of a
program’s objectives and the other impacts and effects of the program, intended or unintended.
Outcomes: The results of program operations or activities; the effects triggered by the program.
(For example, increased knowledge, changed attitudes or beliefs, reduced tobacco use, reduced TB
morbidity and mortality.)
Outputs: The direct products of program activities; immediate measures of what the program did.
Plausible hypotheses: Likely alternative explanations or ways of accounting for program results,
meaning those involving influences other than the program.
Primary data: Data collected by an evaluation team specifically for the evaluation study.
Probability sampling: The selection of units from a population based on the principle of
randomization. Every unit of the population has a calculable (non-zero) probability of being
selected.
Process evaluation: The systematic collection of information to document and assess how a
program was implemented and operates.
Program evaluation: The systematic collection of information about the activities, characteristics,
and outcomes of programs to make judgments about the program, improve program effectiveness,
and/or inform decisions about future program development.
Propriety: The extent to which the evaluation has been conducted in a manner that evidences
uncompromising adherence to the highest principles and ideals (including professional ethics, civil
law, moral code, and contractual agreements).
Qualitative data: Observations that are categorical rather than numerical, and often involve
knowledge, attitudes, perceptions, and intentions.
Quasi-experimental design: Study structures that use comparison groups to draw causal inferences
but do not use randomization to create the treatment and control groups. The treatment group is
usually given. The control group is selected to match the treatment group as closely as possible so
that inferences on the incremental impacts of the program can be made.
Randomization: Use of a probability scheme for choosing a sample. This can be done using
random number tables, computers, dice, cards, and so forth.
Regression artifacts: Pseudo-changes in program results occurring when persons or treatment units
have been selected for the program on the basis of their extreme scores. Regression artifacts are a
threat to internal validity.
Reliability: The extent to which a measurement, when repeatedly applied to a given situation
consistently produces the same results if the situation does not change between the applications.
Reliability can refer to the stability of the measurement over time or to the consistency of the
measurement from place to place.
Replicate sampling: A probability sampling technique that involves the selection of a number of
independent samples from a population rather than one single sample. Each of the smaller samples
is termed a replicate and is independently selected on the basis of the same sample design.
Resources: Assets available and anticipated for operations. They include people, equipment,
facilities, and other things used to plan, implement, and evaluate programs.
Sample size formula: An equation that varies with the type of estimate to be made, the desired
precision of the sample and the sampling method, and which is used to determine the required
minimum sample size.
Sampling error: The error attributed to sampling and measuring a portion of the population rather
than carrying out a census under the same general conditions.
Sampling frame: Complete list of all people or households in the target population.
Sampling method: The method by which the sampling units are selected (such as systematic or
stratified sampling).
Sampling unit: The unit used for sampling. The population should be divisible into a finite number
of distinct, non-overlapping units, so that each member of the population belongs to only one
sampling unit.
Secondary data: Data collected and recorded by another (usually earlier) person or organization,
usually for different purposes than the current evaluation.
Selection bias: When the treatment and control groups involved in the program are initially
statistically unequal in terms of one or more of the factors of interest. This is a threat to internal
validity.
Setting and program interaction: When the setting of the experimental or pilot project is not
typical of the setting envisioned for the full-scale program. This interaction is a threat to external
validity.
Stakeholders: People or organizations that are invested in the program or that are interested in the
results of the evaluation or what will be done with results of the evaluation.
Standard: A principle commonly agreed to by experts in the conduct and use of an evaluation for
the measure of the value or quality of an evaluation (e.g., accuracy, feasibility, propriety, utility).
Standard deviation: The standard deviation of a set of numerical measurements (on an “interval
scale”). It indicates how closely individual measurements cluster around the mean.
Statistical model: A model that is normally based on previous research and permits transformation
of a specific impact measure into another specific impact measure, one specific impact measure into
a range of other impact measures, or a range of impact measures into a range of other impact
measures.
Statistically significant effects: Effects that are observed and are unlikely to result solely from
chance variation. These can be assessed through the use of statistical tests.
Stratified sampling: A probability sampling technique that divides a population into relatively
homogeneous layers called strata, and selects appropriate samples independently in each of those
layers.
Subjective data: Observations that involve personal feelings, attitudes, and perceptions. Subjective
data can be measured quantitatively or qualitatively.
Testing bias: Changes observed in a quasi-experiment that may be the result of excessive
familiarity with the measuring instrument. This is a potential threat to internal validity.
Treatment group: In research design, the group of subjects that receives the program. Also
referred to as the experimental or program group.
Utility: The extent to which an evaluation produces and disseminates reports that inform relevant
audiences and have beneficial impact on their work.
Selected Publications
Connell JP, Kubisch AC, Schorr LB, Weiss, CH. New approaches to evaluating community
initiatives. New York, NY: Aspen Institute, 1995.
Fawcett SB, Paine-Andrews A, Francisco VT, Schulz J, Ritchter KP, et al. Evaluating community
initiatives for health and development. In: Rootman I, Goodstadt M, Hyndman B, et al., eds.
Evaluating Health Promotion Approaches. Copenhagen, Denmark: World Health Organization
(Euro), 1999 (In press).
Fawcett SB, Sterling TD, Paine Andrews A, Harris KJ, Francisco VT, et al. Evaluating community
efforts to prevent cardiovascular diseases. Atlanta, GA: Centers for Disease Control and Prevention,
National Center for Chronic Disease Prevention and Health Promotion, 1995.
Fetterman DM, Kaftarian SJ, Wandersman A. Empowerment evaluation: Knowledge and tools for
self-assessment and accountability. Thousand Oaks, CA: Sage Publications, 1996,
Patton MQ. Utilization-focused evaluation. Thousand Oaks, CA: Sage Publications, 1997.
Rossi PH, Freeman HE, Lipsey MW. Evaluation: A systematic approach. Newbury Park, CA: Sage
Publications, 1999.
Shadish WR, Cook TD, Leviton LC. Foundations of program evaluation. Newbury Park, CA: Sage
Publications, 1991.
University of Toronto, Health Communication Unit at the Center for Health Promotion. Evaluating
health promotion programs (see Web-based entry on page 66).