Of Tests and Testing
Bill Chislev Jeff Cabrera
Assumptions about Psychological Testing
4-2
Psychological Traits and States Exist
• A trait has been defined as “any distinguishable, relatively enduring way
in which one individual varies from another” (Guilford, 1959, p. 6).
• States also distinguish one person from another but are relatively less
enduring (Chaplin et al., 1988).
• Thousands of trait terms can be found in the English language (e.g.,
outgoing, shy, reliable, calm, etc.).
4-3
Psychological Traits and States Exist
• Psychological traits exist as constructs -
an informed, scientific concept developed
or constructed to describe or explain
behavior.
• We can’t see, hear, or touch constructs,
but we can infer their existence from
overt behavior, such as test scores.
4-4
• Traits are relatively stable. They may
change over time, yet there are often
high correlations between trait scores at
different time points.
• The nature of the situation influences
how traits will be manifested.
• Trait and state we use refer to ways in
which one individual varies, or differs,
from another
Some people score higher than others
on traits like sensation-seeking
4-5
Traits and States Can Be Quantified and Measured
• Different test developers may define and
measure constructs in different ways.
• Once a construct is defined, test developers
turn to item content and item weighting.
• A scoring system and a way to interpret
results need to be devised.
Cumulative Scoring – test score is presumed
to represent the strength of the targeted ability
or trait or state.
4-6
Test-Related Behavior Predicts Non-Test-Related Behavior
Responses on tests are thought to predict real-world behavior. The
obtained sample of behavior is expected to predict future behavior.
Tests Have Strengths and Weaknesses
Competent test users understand and appreciate the limitations of the tests
they use as well as how those limitations might be compensated for by data
from other sources.
4-7
Various Sources of Error are Part of Assessment
Error refers to a long-standing assumption that factors
other than what a test attempts to measure will influence
performance on the test.
Error variance - the component of a test score attributable
to sources other than the trait or ability measured.
• Both the assessee and assessor are sources of error
variance
4-8
Testing and Assessment can be Conducted in a Fair Manner
• All major test publishers strive to develop instruments that are
fair when used in strict accordance with guidelines in the test
manual.
• Problems arise if the test is used with people for whom it was not
intended.
• Some problems are more political than psychometric in nature.
Testing and Assessment Benefit Society
• There is a great need for tests, especially good tests, considering
the many areas of our lives that they benefit.
4-9
What’s a “Good Test?”
Reliability: The consistency of the measuring tool: the precision
with which the test measures and the extent to which error is
present in measurements.
Validity: The test measures what it purports to measure.
Other considerations: Administration, scoring, interpretation
should be straightforward for trained examiners. A good test is a
useful test that will ultimately benefit individual test takers or
society at large.
4-10
Norms
• Norm-referenced testing and assessment: a method of
evaluation and a way of deriving meaning from test scores by
evaluating an individual test taker's score and comparing it to
scores of a group of test takers.
The meaning of an individual test score is understood relative to other
scores on the same test.
• Norms are the test performance data of a particular group of test
takers that are designed for use as a reference when evaluating or
interpreting individual test scores.
A normative sample is the reference group to which test-takers are
compared.
4-11
Norms
• Normative Sample: is that group of people whose performance on
a particular test is analyzed for reference in evaluating the
performance of individual test takers..
• Norming, refers to the process of deriving norms. Norming may be
modified to describe particular type of norm deviation.
4-12
Sampling to Develop Norms
Standardization: The process
of administering a test to a
representative sample of test
takers for the purpose of
establishing norms.
Sampling – Test developers
select a population, for which the
test is intended, that has at least
one common, observable
characteristic.
4-13
Sampling to Develop Norms
Stratified sampling: Sampling that includes different
subgroups, or strata, from the population.
Stratified-random sampling: Every member of the
population has an equal opportunity of being included in a
sample.
4-14
Sampling to Develop Norms
Purposive sample: Arbitrarily selecting a sample that is
believed to be representative of the population.
4-15
Sampling to Develop Norms
Incidental/convenience sample: A sample that is convenient
or available for use. May not be representative of the
population.
•Generalization of findings from convenience samples must be
made with caution.
4-16
Sampling to Develop Norms
Developing Norms
Having obtained a sample test developers:
• Administer the test with standard set of instructions
• Recommend a setting for test administration
• Collect and analyze data
• Summarize data using descriptive statistics including
measures of central tendency and variability
• Normative Sample and Standardization Sample
4-17
Types of Norms
• Percentile
- the percentage of people whose score on a test or
measure falls below a particular raw score.
• Percentiles
are a popular method for organizing test-related
data because they are easily calculated.
• Oneproblem is that real differences between raw scores
may be minimized near the ends of the distribution and
exaggerated in the middle of the distribution.
4-18
Types of Norms
Age norms: average performance of different samples of test-takers who
were at various ages when the test was administered.
Grade norms: the average test performance of test takers in a given school
grade.
National norms: derived from a normative sample that was nationally
representative of the population at the time the norming study was
conducted.
National anchor norms: An equivalency table for scores on two different
tests. Allows for a basis of comparison.
Subgroup norms: A normative sample can be segmented by any of the
criteria initially used in selecting subjects for the sample.
Local norms: provide normative information with respect to the local
population’s performance on some test.
4-19
Fixed Reference Group Scoring Systems
Fixed Reference Group Scoring Systems: The distribution of
scores obtained on the test from one group of test takers is used as
the basis for the calculation of test scores for future
administrations of the test.
• The SAT employs this method.
4-20
Norm-Referenced vs. Criterion-Referenced
Interpretation
Norm referenced tests involve comparing individuals to the
normative group. With criterion referenced tests test takers are
evaluated as to whether they meet a set standard (e.g., a driving
exam).
We may define a criterion as a standard on which a judgment or
decision may be based.
4-21
To be eligible for a high-school diploma, students must
demonstrate at least a sixth-grade reading level.
To earn the privilege of driving an automobile, would-be drivers
must take a road test and demonstrate their driving skill to the
satisfaction of a state-appointed examiner.
To be licensed as a psychologist, the applicant must achieve a
score that meets or exceeds the score mandated by the state on
the licensing test.
To conduct research using human subjects, many universities
and other organizations require researchers to successfully
complete an online course that presents testtakers with ethics-
oriented information in a series of modules, followed by a set of
forced-choice questions.
4-22
Culture and Inference
• Inselecting a test for use, responsible test users should
research the test’s available norms to check how appropriate
they are for use with the targeted test taker population.
• When interpreting test results, it helps to know about the
culture and era of the test-taker.
• It is important to conduct culturally informed assessment.
4-23