Improving Your Test Questions: I. Choosing Between Objective and Subjective Test Items
Improving Your Test Questions: I. Choosing Between Objective and Subjective Test Items
(from: http://www.cte.uiuc.edu/dme/exams/ITQ.html)
Table of Contents
1. Choosing between Objective and Subjective Test Items
2. Suggestions for Using and Writing Test Items
Multiple Choice
True-False
Matching
Completion
Essay
Problem Solving
Performance
3. Two Methods for Assessing Test Item Quality
4. References for Further Reading
There are two general categories of test items: (1) objective items which require students to select the
correct response from several alternatives or to supply a word or short phrase to answer a question or
complete a statement; and (2) subjective or essay items which permit the student to organize and
present an original answer. Objective items include multiple-choice, true-false, matching and
completion, while subjective items include short-answer essay, extended-response essay, problem
solving and performance test items. For some instructional purposes one or the other item types may
prove more efficient and appropriate. To begin out discussion of the relative merits of each type of
test item, test your knowledge of these two item types by answering the following questions.
1. TRUE Essay items are generally easier and less time consuming to construct than are most
objective test items. Technically correct and content appropriate multiple-choice and true-
false test items require an extensive amount of time to write and revise. For example, a
professional item writer produces only 9-10 good multiple-choice items in a day's time.
2. ? According to research findings it is still undetermined whether or not essay tests require
or facilitate more thorough (or even different) student study preparation.
3. TRUE Writing skills do affect a student's ability to communicate the correct "factual"
information through an essay response. Consequently, students with good writing skills
have an advantage over students who have difficulty expressing themselves through
writing.
4. FALSE Essays do not teach a student how to write but they can emphasize the importance of
being able to communicate through writing. constant use of essay tests may encourage the
knowledgeable but poor writing student to improve his/her writing ability in order to
improve performance.
5. TRUE Essays are more subjective in nature due to their susceptibility to scoring influences.
Different readers can rate identical responses differently, the same reader can rate the
same paper differently over time, the handwriting, neatness or punctuation can
unintentionally affect a paper's grade and the lack of anonymity can affect the grading
process. While impossible to eliminate, scoring influences or biases can be minimized
through procedures discussed later in this booklet.
6. ? Both item types encourage some form of guessing. Multiple-choice, true-false and
matching items can be correctly answered through blind guessing, yet essay items can be
responded to satisfactorily through well written bluffing.
7. TRUE Due to the extent of time required by the student to respond to an essay question, only a
few essay questions can be included on a classroom exam. Consequently, a larger number
of objective items can be tested in the same amount of time, thus enabling the test to cover
more content.
8. TRUE Both item types can measure similar content or learning objectives. Research has shown
that students respond almost identically to essay and objective test items covering the
same content. Studies1 by Sax & Collet (1968) and Paterson (1926) conducted forty-two
years apart reached the same conclusion:
"...there seems to be no escape from the conclusions that the two types of exams are
measuring identical things." (Paterson, p. 246)
This conclusion should not be surprising; after all, a well written essay item requires that
the student (1) have a store of knowledge, (2) be able to relate facts and principles, and (3)
be able to organize such information into a coherent and logical written expression,
whereas an objective test item requires that the student (1) have a store of knowledge, (2)
be able to relate facts and principles, and (3) be able to organize such information into a
coherent and logical choice among several alternatives.
9. TRUE Both objective and essay test items are good devices for measuring student achievement.
However, as seen in the previous quiz answers, there are particular measurement
situations where one item type is more appropriate than the other. Following is a set of
recommendations for using either objective or essay test items: (Adapted from Robert L.
Ebel, Essentials of Educational Measurement, 1972, p. 144).
1
Gilbert Sax and LeVerne S. Collet, "An Empirical Comparison of the Effects of Recall and Multiple-
Choice Tests on Student Achievement," Journal of Educational Measurement, vol. 5 (1968), 169-73.
Donald G. Paterson, "Do New and Old Type Examinations Measure Different Mental Functions?"
School and Society, vol. 24. (August 21, 1926), 246-48.
measure almost any important educational achievement a written test can measure.
test understanding and ability to apply principles.
test ability to think critically.
test ability to solve problems.
test ability to select relevant facts and principles and to integrate them toward the solution of
complex problems.
In addition to the preceding suggestions, it is important to realize that certain item types are
better suited than others for measuring particular learning objectives. For example, learning
objectives requiring the student to demonstrate or to show, may be better measured by
performance test items, whereas objectives requiring the student to explain or to describe
may be better measured by essay test items. The matching of learning objective expectations
with certain item types can help you select an appropriate kind of test item for your classroom
exam as well as provide a higher degree of test validity (i.e., testing what is supposed to be
tested). To further illustrate, several sample learning objectives and appropriate test items are
provided on the following page.
After you have decided to use either an objective, essay or both objective and essay exam, the
next step is to select the kind(s) of objective or essay item that you wish to include on the
exam. To help you make such a choice, the different kinds of objective and essay items are
presented in the following section of this booklet. The various kinds of items are briefly
described and compared to one another in terms of their advantages and limitations for use.
Also presented is a set of general suggestions for the construction of each item variation.
The multiple-choice item consists of two parts: (a) the stem, which identifies the question or
problem and (b) the response alternatives. Students are asked to select the one alternative that
best completes the statement or answers the question. For example,
*correct response
Advantages in Using Multiple-Choice Items
place a high degree of dependence on the student's reading ability and instructor's
writing ability.
The Stem
1. When possible, state the stem as a direct question rather than as an incomplete statement.
Undesirable: While ironing her formal, Jane burned her hand accidentally on the hot
iron. This was due to a transfer of heat be ...
Desirable: Which of the following ways of heat transfer explains why Jane's hand was
burned after she touched a hot iron?
4. Include in the stem any word(s) that might otherwise be repeated in each alternative.
Desirable: In national elections in the United States the President is officially chosen
by
a. the people.
b. members of Congress.
c. the House of Representatives.
*d. the Electoral college.
5. Use negatively stated stems sparingly. When used, underline and/or capitalize the
negative word.
Undesirable Desirable
a.Digestion a.Digestion
b.Relaxation b.Assimilation
*c.Respiration *c.Respiration
d.Exertion d.Catabolism
7. Make the alternatives grammatically parallel with each other, and consistent with the stem.
Undesirable: What would do most to advance the application of atomic discoveries to medicine?
Desirable: What would do most to advance the application of atomic discoveries to medicine?
Undesirable: The daily minimum required amount of milk that a 10 year old child should drink is
a. 1-2 glasses.
*b. 2-3 glasses.
*c. 3-4 glasses.
d. at least 4 glasses.
Desirable: What is the daily minimum required amount of milk a 10 year old child should drink?
a. 1 glass.
b. 2 glasses.
*c. 3 glasses.
d. 4 glasses.
9. When possible, present alternatives in some logical order (e.g., chronological, most to least,
alphabetical).
At 7 a.m. two trucks leave a diner and travel north. One truck averages 42 miles per hour and the
other truck averages 38 miles per hour. At what time will they be 24 miles apart?
Undesirable Desirable
a. 6 p.m. a. 1 a.m.
b. 9 p.m. b. 6 a.m.
c. 1 a.m. c. 9 a.m.
*d. 1 p.m. *d. 1 p.m.
e. 6 a.m. e. 6 p.m.
10. Be sure there is only one correct or best response to the item.
Undesirable: The two most desired characteristics in a classroom test are validity and
a. precision.
*b. reliability.
c. objectivity.
*d. consistency.
Desirable: The two most desired characteristics in a classroom test are validity and
a. precision.
*b. reliability.
c. objectivity.
d. standardization.
11. Make alternatives approximately equal in length.
Undesirable: The most general cause of low individual incomes in the United States is
Desirable: What is the most general cause of low individual incomes in the United States?
12. Avoid irrelevant clues such as grammatical structure, well known verbal associations or
connections between stem and answer.
13. Use at least four alternatives for each item to lower the probability of getting the item correct by
guessing.
14. Randomly distribute the correct response among the alternative positions throughout the test
having approximately the same proportion of alternatives a, b, c, d and e as the correct response.
Use the alternatives "none of the above" and "all of the above" sparingly. When used, such
alternatives should occasionally be used as the correct response.
A true-false item can be written in one of three forms: simple, complex, or compound. Answers can
consist of only two choices (simple), more than two choices (complex), or two choices plus a
conditional completion response (compound). An example of each type of true-false item follows:
incorporate an extremely high guessing factor. For simple true-false items, each student has a
50/50 chance of correctly answering the item without any knowledge of the item's content.
can often lead an instructor to write ambiguous statements due to the difficulty of writing
statements which are unequivocally true or false.
do not discriminate between students of varying ability as well as other item types.
can often include more irrelevant clues than do other item types.
can often lead an instructor to favor testing of trivial knowledge.
1. Base true-false items upon statements that are absolutely true or false, without qualifications or
exceptions.
Undesirable: When you see a highway with a marker that reads, "Interstate 80" you know that the
construction and upkeep of that road is built and maintained by the state and
federal government.
Desirable: The construction and maintenance of interstate highways is provided by both state
and federal governments.
Undesirable: Water will boil at a higher temperature if the atmospheric pressure on its surface is
increased and more heat is applied to the container.
Desirable: Water will boil at a higher temperature if the atmospheric pressure on its surface is
increased.
and/or
Water will boil at a higher temperature if more heat is applied to the container.
4. Include enough background information and qualifications so that the ability to respond correctly
to the item does not depend on some special, uncommon knowledge.
Undesirable: The second principle of education is that the individual gathers knowledge.
Desirable: According to John Dewey, the second principle of education is that the individual
gathers knowledge.
5. Avoid lifting statements from the text, lecture or other materials so that memory alone will not
permit a correct answer.
Undesirable: According to some politicians, the raison d'etre for capital punishment is
retribution.
Desirable: According to some politicians, justification for the existence of capital punishment
is retribution.
8. Avoid the use of specific determiners which would permit a test-wise but unprepared examinee to
respond correctly. Specific determiners refer to sweeping terms like "all," "always," "none,"
"never," "impossible," "inevitable," etc. Statements including such terms are likely to be false. On
the other hand, statements using qualifying determiners such as "usually," "sometimes," "often,"
etc., are likely to be true. When statements do require the use of specific determiners, make sure
they appear in both true and false items.
Each molecule of a given compound is chemically the same as every other molecule
of that compound. (T)
The galvanometer is the instrument usually used for the metering of electrical
energy used in a home. (F)
9. False items tend to discriminate more highly than true items. Therefore, use more false items than
true items (but no more than 15% additional false items).
In general, matching items consist of a column of stimuli presented on the left side of the exam page
and a column of responses placed on the right side of the page. Students are required to match the
response associated with a given stimulus. For example,
Directions: On the line to the left of each factual statement, write the letter of the principle which
bests explains the statement's occurrence. Each principle may be used more than once.
Matching items
require short periods of reading and response time, allowing you to cover more content.
provide objective measurement of student achievement or ability.
provide highly reliable test scores.
provide scoring efficiency and accuracy.
Matching items
have difficulty measuring learning objectives requiring more than simple recall of
information.
are difficult to construct due to the problem of selecting a common set of stimuli and
responses.
1. Include directions which clearly state the basis for matching the stimuli with the responses.
Explain whether or not a response can be used more than once and indicate where to write the
answer.
Desirable: Directions: On the line to the left of each compound in Column I, write the letter of
the compound's formula presented in Column II. Use each formula only
once.
Column I Column II
1. ___ Water A. H2SO4
2. ___ Salt B. HCl
3. ___ Ammonia C. NaCl
4. ___ Sulfuric Acid D. H2O
E. H2HCl
3. Arrange the list of responses in some systematic order if possible (e.g., chronological,
alphabetical).
Directions: On the line to the left of each definition in Column I, write the letter of the defense
mechanism in Column II that is described. Use each defense mechanism only once.
Undesirable Desirable
Column I Column II
Hunting for reasons to support one's
____1. a.Rationalization a.Denial of reality
beliefs.
Accepting the values and norms of others
____2.as one's own even when they are contrary b.Identification b.Identification
to previously held values.
Attributing to others one's own
____3.unacceptable impulses, thoughts and c.Projection c.Introjection
desires.
Ignoring disagreeable situations, topics,
____4. d.Introjection d.Projection
sights.
Denial of
e. e.Rationalization
Reality
Avoid grammatical or other clues to the correct response.
Undesirable: Directions: Match the following in order to complete the sentences on the left.
___ 1. Igneous rocks are formed A. a hardness of 7.
___ 2. The formation of coal requires B. with crystalline rock.
___ 3. A geode is filled C. a metamorphic rock.
___ 4. Feldspar is classified as D. heat and pressure.
E. through the solid-ification of molten lava.
Keep matching items brief, limiting the list of stimuli to under 10.
Include more responses than stimuli to help prevent answering through the process of elimination.
When possible, reduce the amount of reading time by including only short phrases or single words in
the response list. Table of Contents
The completion item requires the student to answer a question or to finish an incomplete statement by
filling in a blank with the correct word or phrase. For example,
According to Freud, personality is made up of three major systems, the _________, the ________
and the ________.
Completion items
2. Do not omit so many words from the statement that the intended meaning is lost.
Undesirable: Most of the United States' libraries are organized according to the (Dewey) decimal
system.
Desirable: Which organizational system is used by most of the United States' libraries? (Dewey
decimal)
Undesirable: Trees which shed their leaves annually are seed-bearing, common).
Desirable: Trees which shed their leaves annually are called (deciduous).
Undesirable: In Greek mythology, Vulcan was the son of (Jupiter) and (Juno) .
Desirable: In Greek mythology, Vulcan was the son of (Jupiter) and (Juno) .
6. When possible, delete words at the end of the statement after the student has been presented a
clearly defined problem.
7. Avoid lifting statements directly from the text, lecture or other sources.
8. Limit the required response to a single word or phrase.
The essay test is probably the most popular of all types of teacher-made tests. In general, a classroom
essay test consists of a small number of questions to which the student is expected to demonstrate
his/her ability to (a) recall factual knowledge, (b) organize this knowledge and (c) present the
knowledge in a logical, integrated answer to the question. An essay test item can be classified as
either an extended-response essay item or a short-answer essay item. The latter calls for a more
restricted or limited answer in terms of form or scope. An example of each type of essay item follows.
Explain the difference between the S-R (Stimulus-Response) and the S-O-R (Stimulus-Organism-
Response) theories of personality. Include in your answer (a) brief descriptions of both theories, (b)
supporters of both theories and (c) research methods used to study each of the two theories. (10 pts.
20 minutes)
Identify research methods used to study the S-R (Stimulus-Response) and S-O-R (Stimulus-Organism-
Response) theories of personality. (5 pts. 10 minutes)
Essay items
are easier and less time consuming to construct than are most other item types.
provide a means for testing student's ability to compose an answer and present it in a logical
manner.
can efficiently measure higher order cognitive objectives (e.g., analysis, synthesis,
evaluation).
Limitations in Using Essay Items
Essay items
1. Prepare essay items that elicit the type of behavior you want to measure.
Learning The student will be able to explain how the normal curve serves as a
Objective: statistical model.
Undesirable: Describe a normal curve in terms of: symmetry, modality, kurtosis and skewness.
Desirable: Briefly explain how the normal curve serves as a statistical model for estimation
and hypothesis testing.
Undesirable: Discuss the economic factors which led to the stock market crash of 1929.
Desirable: Identify the three major economic conditions which led to the stock market crash of
1929. Discuss briefly each condition in correct chronological sequence and in one
paragraph indicate how the three factors were inter-related.
3. Indicate for each item a point value or weight and an estimated time limit for answering.
Undesirable: Compare the writings of Bret Harte and Mark Twain in terms of settings, depth of
characterization, and dialogue styles of their main characters.
Desirable: Compare the writings of Bret Harte and Mark Twain in terms of settings, depth of
characterization, and dialogue styles of their main characters. (10 points 20
minutes)
4. Ask questions that will elicit responses on which experts could agree that one answer is better than
another.
5. Avoid giving the student a choice among optional items as this greatly reduces the reliability of
the test.
6. It is generally recommended for classroom examinations to administer several short-answer items
rather than only one or two extended-response items.
1. Choose a scoring model. Two of the more common scoring models are ANALYTICAL
SCORING and GLOBAL QUALITY.
ANALYTICAL Each answer is compared to an ideal answer and points are assigned for the
SCORING: inclusion of necessary elements. Grades are based on the number of
accumulated points either absolutely (i.e., A=10 or more points, B=6-9 pts.,
etc.) or relatively (A=top 15% scores, B=next 30% of scores, etc.)
GLOBAL Each answer is read and assigned a score (e.g., grade, total points) based
QUALITY: either on the total quality of the response or on the total quality of the
response relative to other student answers.
"Americans are a mixed-up people with no sense of ethical values. Everyone knows that baseball
is far less necessary than food and steel, yet they pay ball players a lot more than farmers and
steelworkers."
WHY? Use 3-4 sentences to indicate how an economist would explain the above situation.
Analytical Scoring
Global Quality
Assign scores or grades on the overall quality of the written response as compared to an ideal
answer. Or, compare the overall quality of a response to other student responses by sorting the
papers into three stacks:
Below Average Average Above Average
Read and sort each stack again devide into three more stacks
In total, nine discriminations can be used to assign test grades in this manner. The number of stacks
or discriminations can vary to meet your needs.
2. Try not to allow factors which are irrelevant to the learning outcomes being measured affect your
grading (i.e., handwriting, spelling, neatness).
3. Read and grade all class answers to one item before going on to the next item.
4. Read and grade the answers without looking at the students' names to avoid possible preferential
treatment.
5. Occasionally shuffle papers during the reading of answers to help avoid any systematic order
effects (i.e., Sally's "B" work always followed Jim's "A: work thus it looked more like "C" work).
6. When possible, ask another instructor to read and grade your students' responses.
Another form of a subjective test item is the problem solving or computational exam question. Such
items present the student with a problem situation or task and require a demonstration of work
procedures and a correct solution, or just a correct solution. This kind of test item is classified as a
subjective type of item due to the procedures used to score item responses. Instructors can assign full
or partial credit to either correct or incorrect solutions depending on the quality and kind of work
procedures presented. An example of a problem solving test item follows.
It was calculated that 75 men could complete a strip on a new highway in 70 days. When work was
scheduled to commence, it was found necessary to send 25 men on another road project. How many
days longer will it take to complete the strip? Show your work for full or partial credit.
Undesirable: During a car crash, the car slows down at the rate of 490 m/sec2. What is the
magnitude and direction of the force acting on a 100-kg driver?
Desirable: During a car crash, the car slows down at the rate of 490 m/sec2. Using the car as
a frame of reference, what is the magnitude and direction of the gram force acting
on a 100-kg driver?
2. Provide directions which clearly inform the student of the type of response called for.
Undesirable: An American tourist in Paris finds that he weighs 70 kilograms. When he left the
United States he weighed 144 pounds. What was his net change in weight?
Desirable: An American tourist in Paris finds that he weighs 70 kilograms. When he left the
United States he weighed 144 pounds. What was his net weight change in pounds?
3. State in the directions whether or not the student must show his/her work procedures for full or
partial credit.
Undesirable: A double concave lens is made of glass with n = 1.50. If the radii of curvature of the
two lens surfaces are both 30.0 cm, what is the focal length of the lens?
Desirable: A double concave lens is made of glass with n = 1.50. If the radii of curvature of the
two lens surfaces are both 30.0 cm, what is the focal length of the lens? Show your
work to receive full or partial credit.
4. Clearly separate item parts and indicate their point values.
A man leaves his home and drives to a convention at an average rate of 50 miles per hour. Upon
arrival, he finds a telegram advising him to return at once. He catches a plane that takes him back
at an average rate of 300 miles per hour.
Undesirable: If the total traveling time was 1 3/4 hours, how long did it take him to fly back?
How far from his home was the convention?
Desirable: If the total traveling time was 1 3/4 hours:
Undesirable: An automobile weighing 2,840 N (about 640 pounds) is traveling at a speed of 300
miles per hour. What is the car's kinetic energy? Show your work. (2 pts.)
Desirable: An automobile weighing 14,200 N (about 3200 pounds) is traveling at a speed of
12m/sec. What is the car's kinetic energy? Show your work. (2 pts.)
6. Ask questions that elicit responses on which experts could agree that one solution and one or more
work procedures are better than others.
7. Work through each problem before classroom administration to double-check accuracy.
A performance test item is designed to assess the ability of a student to perform correctly in a
simulated situation (i.e., a situation in which the student will be ultimately expected to apply his/her
learning). The concept of simulation is central in performance testing; a performance test will
simulate to some degree a real life situation to accomplish the assessment. In theory, a performance
test could be constructed for any skill and real life situation. In practice, most performance tests have
been developed for the assessment of vocational, managerial, administrative, leadership,
communication, interpersonal and physical education skills in various simulated situations. An
illustrative example of a performance test item is provided below.
Sample Performance Test Item
Assume that some of the instructional objectives of an urban planning course include the
development of the student's ability to effectively use the principles covered in the course in various
"real life" situations common for an urban planning professional. A performance test item could
measure this development by presenting the student with a specific situation which represents a "real
life" situation. For example,
An urban planning board makes a last minute request for the professional to act as consultant and
critique a written proposal which is to be considered in a board meeting that very evening. The
professional arrives before the meeting and has one hour to analyze the written proposal and prepare
his critique. The critique presentation is then made verbally during the board meeting; reactions of
members of the board or the audience include requests for explanation of specific points or informed
attacks on the positions taken by the professional.
The performance test designed to simulate this situation would require that the student to be tested
role play the professional's part, while students or faculty act the other roles in the situation. Various
aspects of the "professional's" performance would than be observed and rated by several judges with
the necessary background. The ratings could then be used both to provide the student with a
diagnosis of his/her strengths and weaknesses and to contribute to an overall summary evaluation of
the student's abilities.
can most appropriately measure learning objectives which focus on the ability of the
students to apply skills or knowledge in real life situations.
usually provide a degree of test validity not possible with standard paper and pencil test
items.
are useful for measuring learning objectives in the psychomotor domain.
1. Prepare items that elicit the type of behavior you want to measure.
2. Clearly identify and explain the simulated situation to the student.
3. Make the simulated situation as "life-like" as possible.
4. Provide directions which clearly inform the students of the type of response called for.
5. When appropriate, clearly state time and activity limitations in the directions.
6. Adequately train the observer(s)/scorer(s) to ensure that they are fair in scoring the
appropriate behaviors.
This section of the booklet presents two methods for collecting feedback on the quality of your test
items. The two methods include using self-review checklists and student evaluation of test item
quality. You can use the information gathered from either method to identify strengths and
weaknesses in your item writing.
EVALUATE YOUR TEST ITEMS BY CHECKING THE SUGGESTIONS WHICH YOU FEEL YOU
HAVE FOLLOWED.
____ When possible, stated the stem as a direct question rather than as an incomplete statement.
____ Presented a definite, explicit and singular question or problem in the stem.
____ Eliminated excessive verbiage or irrelevant information from the stem.
____ Included in the stem any word(s) that might have otherwise been repeated in each alternative.
____ Used negatively stated stems sparingly. When used, underlined and/or capitalized the negative
word(s).
____ Made all alternatives plausible and attractive to the less knowledgeable or skillful student.
____ Made the alternatives grammatically parallel with each other, and consistent with the stem.
____ Made the alternatives mutually exclusive.
____ When possible, presented alternatives in some logical order (e.g., chronologically, most to
least).
____ Made sure there was only one correct or best response per item.
____ Made alternatives approximately equal in length.
____ Avoided irrelevant clues such as grammatical structure, well known verbal associations or
connections between stem and answer.
____ Used at least four alternatives for each item.
____ Randomly distributed the correct response among the alternative positions throughout the test
having approximately the same proportion of alternatives a, b, c, d, and e as the correct
response.
____ Used the alternatives "none of the above" and "all of the above" sparingly. When used, such
alternatives were occasionally the correct response.
____ Based true-false items upon statements that are absolutely true or false, without qualifications
or exceptions.
____ Expressed the item statement as simply and as clearly as possible.
____ Expressed a single idea in each test item.
____ Included enough background information and qualifications so that the ability to respond
correctly did not depend on some special, uncommon knowledge.
____ Avoided lifting statements from the text, lecture or other materials.
____ Avoided using negatively stated item statements.
____ Avoided the use of unfamiliar language.
____ Avoided the use of specific determiners such as "all," "always," "none," "never," etc., and
qualifying determiners such as "usually," "sometimes," "often," etc.
____ Used more false items than true items (but not more than 15% additional false items).
____ Included directions which clearly stated the basis for matching the stimuli with the response.
____ Explained whether or not a response could be used more than once and indicated where to
write the answer.
____ Used only homogeneous material.
____ When possible, arranged the list of responses in some systematic order (e.g., chronologically,
alphabetically).
____ Avoided grammatical or other clues to the correct response.
____ Kept items brief (limited the list of stimuli to under 10).
____ Included more responses than stimuli.
____ When possible, reduced the amount of reading time by including only short phrases or single
words in the response list.
____ Prepared items that elicited the type of behavior you wanted to measure.
____ Phrased each item so that the student's task was clearly indicated.
____ Indicated for each item a point value or weight and an estimated time limit for answering.
____ Asked questions that elicited responses on which experts could agree that one answer is better
than others.
____ Avoided giving the student a choice among optional items.
____ Administered several short-answer items rather than 1 or 2 extended-response items.
____ Prepared items that elicit the type of behavior you wanted to measure.
____ Clearly identified and explained the simulated situation to the student.
____ Made the simulated situation as "life-like" as possible.
____ Provided directions which clearly inform the students of the type of response called for.
____ When appropriate, clearly stated time and activity limitations in the directions.
____ Adequately trained the observer(s)/scorer(s) to ensure that they were fair in scoring the
appropriate behaviors.
Ebel, Robert L. Measuring educational achievement. Englewood Cliffs, New Jersey: Prentice-Hall,
1965, Chapters 4-6.
Ebel, Robert L. Essentials of educational measurement. Englewood Cliffs, New Jersey: Prentice-
Hall, 1972, Chapters 5-8.
Gronlund, N. E. Measurement and evaluation in teaching. New York: Macmillan Publishing Co.,
1976, Chapters 6-9.
Mehrens, W. A. & Lehmann, I. J. Measurement and evaluation in education and psychology. New
York: Holt, Rinehart & Winston, Inc., 1973, Chapters 7-10.
Nelson, C. H. Measurement and evaluation in the classroom. New York: Macmillan Publishing Co.,
1970, Chapters 5-8. Measurement and Evaluation Division, 247 Armory Building. Especially
useful for science instruction.
Payne, David A. The assessment of learning. Lexington, Mass.: D.C. Heath and Co., 1974, Chapters
4-7.
Scannell, D. P. & Tracy, D. B. Testing and measurement in the classroom. New York: Houghton-
Mifflin Co., 1975, Chapters 4-6.
Thorndike, R. L. (Ed.). Educational measurement (2nd ed.). Washington, D.C.: American Council on
Education, 1971, Chapter 9 (Performance testing) and Chapter 10 (Essay exams).