0% found this document useful (0 votes)

299 views34 pages

Slide 6 - Test Construction and Adaptation

The document outlines the stages of test construction and adaptation, including test conceptualization, construction, try-out, item analysis, and revision. It discusses the importance of item development, scoring methods, and the process of linguistic validation for translating tests into different languages and cultures. Key concepts include norm-referenced vs. criterion-referenced tests, various item formats, and the significance of ensuring conceptual equivalence in translations.

Uploaded by

Kiran Chohan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

299 views34 pages

Slide 6 - Test Construction and Adaptation

Uploaded by

Kiran Chohan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Test construction and

Adaptation
Test Construction
Five main stages in test development

Test conceptualization

Test construction

Test try-out

Item Analysis

Test Revision
Stage1: Test conceptualization
❑ The beginnings of any published test can probably be traced
to thoughts- self-talk-in behavioural terms.
❑ Test developer might say to himself/herself something like
“There ought to be a test designed to measure ______ in
(such & such) way”
❑ A review of the available literature on existing tests
designed to measure a particular construct might indicate
that such tests leave much to be desired in psychometric
soundness.
❑ An emerging social phenomenon or pattern of behavior
might serve as the stimulus for the development of a new
test. The development of a new test may be in response to a
need to assess mastery in an emerging occupation or
profession.
Some important questions
❑ Some preliminary questions include:
▪ What is the test designed to measure?
▪ What is objective of the test?
▪ Is there a need for the test?
▪ Who will use and take this test?
▪ What content it should cover?
▪ How will the test be administered?
▪ What is the ideal format of the test?
▪ What special training will be required of test users for
administering or interpreting the test?
▪ Is there any potential for harm as the result of an
administration of this test?
▪ How will meaning be attributed to scores on the test?
Norm-referenced Vs. criterion-referenced tests: Item development issues

Norm-referenced Criterion referenced

❑ Norm referenced tests allow for ❑ Criterion referenced test Evaluate an
interpretation in reference to a individual’s score with reference to a set
large standardization sample. standard.
❑ An item differentiating well ❑ The presence of this characteristic is not
between high and low scorers is what makes an item good or acceptable
a good one. from a criterion oriented perspective.
❑ Are insufficient and ❑ Each item addresses the issue of whether the
inappropriate when knowledge test taker has met certain criteria.
of mastery is required. ❑ Development of a criterion-referenced test
may entail exploratory work with at least
two groups of test takers (One who have
mastered the material and the other who has
not). The items that best discriminate
between these two groups would be
considered “good” items.
❑ Mostly employed in licensing contexts, be it
a license to practice medicine or to drive a
car.
Stage 2: Test construction
❑ Measurement
▪ Assignment of numbers according to rules.
❑ Scaling
▪ The process of setting rules for assigning numbers in measurement.
▪ It is a process by which a measuring device is designed and calibrated,
and the way numbers (or other indices) -scale values- are assigned to
different amount of the trait, attribute or characteristic being measured.
▪ Historically, Thurston is credited for being at the fore front of efforts to
develop methodologically sound scaling methods.
▪ scales can be meaningfully categorized along a continuum of level of
measurement and be referred to as nominal, ordinal, interval, or ratio.
Scaling continued….

Types of scales

Rating scales
(Likert scale)

Method of paired
Guttmann scale Adjective checklist
comparison
• Likert scale (Likert, 1932), is used extensively in psychology, are
relatively easy to construct. Each item presents the test taker with
five alternative responses (sometimes seven), usually on an agree–
disagree or approve–disapprove continuum.
• Method of paired comparisons. Test takers are presented with
pairs of stimuli (two photographs, two objects, two statements),
which they are asked to compare. They must select one of the
stimuli according to some rule; for example, the rule that they agree
more with one statement than the other, or the rule that they find
one stimulus more appealing than the other.
• Guttman scale (Guttman, 1944a,b, 1947): ordinal-level measures.
Items on it range sequentially from weaker to stronger expressions
of the attitude, belief, or feeling being measured. A feature of
Guttman scales is that all respondents who agree with the stronger
statements of the attitude will also agree with milder statements. I.e.
10-item scale, 8 scores mean the participant agrees with first 8
statements.
Writing Items

⮚ How does one develop items for the item pool?

⮚ The test developer may write a large number of items from
personal experience or academic acquaintance with the subject
matter. Help may also be sought from others, including experts.
⮚ For psychological tests designed to be used in clinical settings, clinicians,
patients, patients’ family members, clinical staff, and others may be
interviewed for insights that could assist in item writing.
⮚ For psychological tests designed to be used by personnel psychologists,
interviews with members of a targeted industry or organization will likely
be of great value.
⮚ For psychological tests designed to be used by school psychologists,
interviews with teachers, administrative staff, educational psychologists,
and others may be invaluable.
How much items to be include?
⮚ Advisable is that the first draft contains
approximately twice the number of items that the
final version of the test will contain.
⮚ An item pool is the reservoir or well from which
items will or will not be drawn for the final version
of the test.
⮚ A comprehensive sampling provides a basis for
content validity of the final version of the test.
❑ Item format
Variables such as form, plan, structure, arrangement,
and layout of individual test items are collectively
referred to as item format.
Types of item format

Selected response format Constructed response format

❑ Requires test taker to select ❑ Requires test taker to supply
a response from a set of or create the correct answer,
alternative responses. not merely to select it.
❑ Examples include multiple ❑ Examples include
choice, matching and true completion items, the short
false. answer, and the essay.
Scoring items

Cumulative scoring:

Class/category scoring

Ipsative scoring
• The rule in a cumulatively scored test is that the higher the
score on the test, the higher the test taker is on the ability,
trait, or other characteristic that the test purports to measure.
• In tests that employ class scoring or (also referred to as
category scoring), test taker responses earn credit toward
placement in a particular class or category with other test
takers whose pattern of responses is presumably similar in
some way. This approach is used by some diagnostic
systems wherein individuals must exhibit a certain number
of symptoms to qualify for a specific diagnosis.
• A third scoring model, ipsative scoring, departs radically in
rationale from either cumulative or class models. A typical
objective in ipsative scoring is comparing a test taker's
score on one scale within a test to another scale within that
same test. EPPS ipsative scoring system
Test
Test Tryout
Tryout

• Having created a pool of items from which the final version of

the test will be developed, the test developer will try out the
test. The test should be tried out on people who are similar in
critical respects to the people for whom the test was designed.
• Equally important are questions about the number of people
on whom the test should be tried out. An informal rule of thumb
is that there should be no fewer than 5 subjects and preferably
as many as 10 for each item on the test, (1:5)
• The test tryout should be executed under conditions as
identical as possible to the conditions under which the
standardized test will be administered; all instructions, and
everything from the time limits allotted for completing the test
to the atmosphere at the test site, should be as similar as
possible.
Stage 4: Item analysis

❑ The criteria for best items may differ as a function of the

test developer’s objectives.
❑ For example one test developer might deem the best items
to be those that optimally contribute to internal reliability,
other may wish to design a test with highest possible
criterion related validity and thus select items accordingly.
An item
difficulty index

An item Tools to
An index of
discrimination analyse & Item reliability
index select items

An index of
Item validity
Item Difficulty index
❑ An index of an item difficulty is obtained by calculating the
proportion of the total number of test taker who got the item right.
❑ A lowercase italicized P (p) is used to denote item difficulty and the
subscript refers to item number (p1 is read as item difficulty index
for item 1).
❑ This value can range from 0 (if no one got the item right) to 1 (If
everyone got the item right).
❑ For Exp. if 50 out of hundred examinees got item 2 right then item
difficulty index will be 50/100= .5 (p2 =.5)
❑ This statistic is referred to as Item difficulty index in the context of
achievement testing. While its called Item endorsement index in the
context of personality testing. Here the statistic don't provide a
measure of the percent of people passing the item, but a measure of
the percent of people who said yes to, agree with, or otherwise
endorsed the item.
❑ An index of the difficulty of the average test item for a
particular test can be calculated by averaging the item
difficulty indices for all test items. Summing Item difficulty
indices of all test items and then dividing it by the total
number of items.
❑ For maximum discrimination among the abilities of the test
takers, the optimal average item difficulty is
approximately .5, with individual items on the test ranging
in difficulty from about .3 to .8.
Item reliability index

❑ It provides an indication of the internal consistency

of a test; the higher this index, the greater the test’s
internal consistency.
❑ This index is equal to the product of the item score
standard deviation (s), and the correlation between
the item score and the total test score.
❑ Factor analysis and inter-item consistency (whether
items on a test appear to be measuring the same
thing)
❑ If too many items appear to be tapping a particular
area, the weakest of such items can be eliminated.
Item validity index

❑ It is a statistic designed to provide an indication of the

degree to which a test is measuring what it purports to
measure. The higher this index, the greater the test’s
criterion related validity.
❑ It is more important to calculate this index when the test
developer’s goal is to maximize criterion-related validity of
the test.
❑ The item validity index can be calculated using the
following two statistics
▪ The item score standard deviation
▪ The correlation between the item score and the criterion
score.
Item Discrimination Index

❑ Measures of Item discrimination indicate how adequately an item

separates or discriminates between high scorers and low scorers
on an entire test.
❑ For exp. a multiple choice item on an achievement test is a good
item if most of the high scorers answer correctly and most of the
low scorers answer incorrectly.
❑ The item-discrimination index is symbolized by a lowercase,
italicized letter d (d). The higher the value of d, the greater the
number of high scorers answering the item correctly.
❑ This index is actually a measure of the difference between the
proportion of high scorers answering an item correctly and the
proportion of low scorers answering an item correctly.
❑ The formula for calculating this is:
d [(U-L)/n]
❑ If same proportion of people of the U and L groups pass the item, the
item is not discriminating between test takers at all and d would be
equal to 0.
❑ The lowest possible d value could be -1, this is a test developer’s
nightmare as it indicates that all members of the U group failed to
answer correctly and all members of the L group passed it.
❑ A negative d value is a red flag; it indicates that low scorers are more
likely to answer correctly than high scorers
Item discrimination indices for five hypothetical items
Translation & Adaptation
Test translation and adaptation

• Test adaptation is a process by which a test

(or assessment instrument) is transformed
from a source language and/or culture into a
target language and/or culture.
Linguistic validation

❑ The aim of the linguistic validation is to produce a

translated version in a target language which is
conceptually equivalent to the original version, as well as
clear and easy to understand.
❑ The translated instrument should be understood by most
respondents in a selected population and should maintain a
reading and comprehension level that will be accessible by
most respondents, even of a low education level.
❑ Conceptual equivalence is the absence of differences in the
meaning and content between the source language and the
translated version. This is achieved through a process
called linguistic validation.
Steps in Standard Linguistic validation

Steps for Standard Linguistic Validation (Mapi, 2008).

Selection and briefing

Forward translation Backward translation

Pilot testing

Validation

Proof reading Final tool

Characteristics of the translator

❑ BILINGUAL
❑ PREVIOUS EXPERIENCE IN TRANSLATING
TESTS/INSTRUMENTS
Translation process
• First, the translation process from the beginning ought to be
conducted by bilinguals, that is, by people proficient in both
languages (Forward Translation). They should conduct the
so-called back-translation: initially, they translate the original
version of the method and then transfer this version back into
the original language (Backward Translation). Then both
versions are compared.
• Both versions of a questionnaire can be administered on the
same bilingual individuals. If the investigator gets similar
results on both versions, this is a good indicator that the
translation was conducted successfully.
• On the basis of comparison with the original versions of the
scales, each and every item is reviewed. If any discrepancy
found then that item is reviewed again.
• After shaping final translated version, the
items are proof read for further clarity.
• Before starting the research, try out is carried
out to check the feasibility of the data and to
see whether the sample easily comprehend
the items of the scale and respond
accordingly.
Thank You ☺

Psychological Test Construction Guide
No ratings yet
Psychological Test Construction Guide
3 pages
PsychAss Chapter 7
No ratings yet
PsychAss Chapter 7
5 pages
Ethical Practices of Assessment
100% (1)
Ethical Practices of Assessment
12 pages
Community Psychology Class
No ratings yet
Community Psychology Class
32 pages
Neo-Ffi Write Up Ignou
100% (1)
Neo-Ffi Write Up Ignou
4 pages
Notes
No ratings yet
Notes
135 pages
Environmental Self-Efficacy Scale Study
No ratings yet
Environmental Self-Efficacy Scale Study
25 pages
Reliability, Validity & Norms
No ratings yet
Reliability, Validity & Norms
25 pages
Norms: Unit 1
No ratings yet
Norms: Unit 1
16 pages
Comprehensive Guide to Psychological Assessments
No ratings yet
Comprehensive Guide to Psychological Assessments
5 pages
Problems and Errors in Measurements by Pugazh
No ratings yet
Problems and Errors in Measurements by Pugazh
10 pages
Vineland Social Maturity
No ratings yet
Vineland Social Maturity
12 pages
Bell's Adjustment Inventory Written Material
No ratings yet
Bell's Adjustment Inventory Written Material
9 pages
Rating Scales in Social Science Research
No ratings yet
Rating Scales in Social Science Research
8 pages
Norms
No ratings yet
Norms
36 pages
Population Psychology Note
No ratings yet
Population Psychology Note
6 pages
The Edwards Personal Preference
100% (1)
The Edwards Personal Preference
3 pages
Nvti Intro
No ratings yet
Nvti Intro
29 pages
Observation Method
No ratings yet
Observation Method
9 pages
Critical Issues in Psychotherapy
No ratings yet
Critical Issues in Psychotherapy
8 pages
Construction of Test Items - Rational and Empirical Approach
No ratings yet
Construction of Test Items - Rational and Empirical Approach
3 pages
Clinical Assessment Guide
No ratings yet
Clinical Assessment Guide
13 pages
Persuasion and Resistance Strategies
No ratings yet
Persuasion and Resistance Strategies
5 pages
Ethics in Counselling
No ratings yet
Ethics in Counselling
14 pages
NEO FFI Report
No ratings yet
NEO FFI Report
6 pages
Understanding Psychological Tests and Their Uses
No ratings yet
Understanding Psychological Tests and Their Uses
4 pages
Use of Technology in Psychological Assessment
No ratings yet
Use of Technology in Psychological Assessment
4 pages
16 PF Test Report PDF
No ratings yet
16 PF Test Report PDF
17 pages
Bishop Cotton Women's Christian College: Rathus Assertiveness Scale
No ratings yet
Bishop Cotton Women's Christian College: Rathus Assertiveness Scale
14 pages
ORGANISATIONAL EFFECTIVENESS SCALE Practical
No ratings yet
ORGANISATIONAL EFFECTIVENESS SCALE Practical
8 pages
Udaan PPT
No ratings yet
Udaan PPT
14 pages
Tradition in Social Psychology
No ratings yet
Tradition in Social Psychology
10 pages
Attributional Style Questionnaire Overview
No ratings yet
Attributional Style Questionnaire Overview
14 pages
8 Assessments in Counselling
No ratings yet
8 Assessments in Counselling
23 pages
Eysenck Personality Questionnaire
No ratings yet
Eysenck Personality Questionnaire
19 pages
Understanding the RAPID PFA Model
No ratings yet
Understanding the RAPID PFA Model
13 pages
SeguinFormBoardtest Aproposalforappropriatenorms
No ratings yet
SeguinFormBoardtest Aproposalforappropriatenorms
8 pages
Behavioral Analysis Performa Skeletal Structure
No ratings yet
Behavioral Analysis Performa Skeletal Structure
4 pages
Relationship Between Subjective Happiness and Wisdom of Retired Professionals
No ratings yet
Relationship Between Subjective Happiness and Wisdom of Retired Professionals
6 pages
16PF Respuestas Inusuales
No ratings yet
16PF Respuestas Inusuales
3 pages
DR Waseem Report
No ratings yet
DR Waseem Report
15 pages
The Development and Validation of The Rational and Intuitive Decision Styles Scale
No ratings yet
The Development and Validation of The Rational and Intuitive Decision Styles Scale
14 pages
Meta-Analysis of Clinical Psychology in India
No ratings yet
Meta-Analysis of Clinical Psychology in India
9 pages
Emotional Intelligence Scale Overview
100% (1)
Emotional Intelligence Scale Overview
3 pages
Assertiveness Training Overview
100% (1)
Assertiveness Training Overview
6 pages
Emotional Intelligence Inventory
No ratings yet
Emotional Intelligence Inventory
16 pages
Statistics Notes
No ratings yet
Statistics Notes
20 pages
Buss-Perry Aggression Test Guide
No ratings yet
Buss-Perry Aggression Test Guide
9 pages
Existential Therapy Rollo May
No ratings yet
Existential Therapy Rollo May
9 pages
Causality and Mill's Canons Explained
0% (1)
Causality and Mill's Canons Explained
8 pages
Correlation
No ratings yet
Correlation
21 pages
Cognitive Psychology Module 5
No ratings yet
Cognitive Psychology Module 5
34 pages
IACP Code of Conduct for Clinical Psychologists
No ratings yet
IACP Code of Conduct for Clinical Psychologists
39 pages
What Is Oldage?: Late Adulthood (Old Age) Is Generally Considered To Begin at About Age 65. Erik
No ratings yet
What Is Oldage?: Late Adulthood (Old Age) Is Generally Considered To Begin at About Age 65. Erik
17 pages
McMaster Family Assessment Device (FAD) Psychometrics
No ratings yet
McMaster Family Assessment Device (FAD) Psychometrics
1 page
Eysenck - S Personality Inventory
No ratings yet
Eysenck - S Personality Inventory
4 pages
Psych Assessment Chapter 3
No ratings yet
Psych Assessment Chapter 3
4 pages
Norm-Referenced Testing and Assessment:: Norms
No ratings yet
Norm-Referenced Testing and Assessment:: Norms
3 pages
17 The Role of Behavioural Factors in Disease and
100% (1)
17 The Role of Behavioural Factors in Disease and
24 pages
Test Development of Assessment
No ratings yet
Test Development of Assessment
26 pages
SNS Usage and Time Management Study
No ratings yet
SNS Usage and Time Management Study
5 pages
Final
No ratings yet
Final
29 pages
Adhd Sobia Ejaz
No ratings yet
Adhd Sobia Ejaz
9 pages
ADHD Treatment Plan For Momin Presentation
No ratings yet
ADHD Treatment Plan For Momin Presentation
9 pages
Alcohol Oxidation
No ratings yet
Alcohol Oxidation
124 pages
S280!70!23 Form 6-TS ProView Programming Guide
No ratings yet
S280!70!23 Form 6-TS ProView Programming Guide
320 pages
SC/ST Development Schemes Approval 2023
No ratings yet
SC/ST Development Schemes Approval 2023
7 pages
MindViewer Guide - Consumer - 595 X 842 - Final - 04
No ratings yet
MindViewer Guide - Consumer - 595 X 842 - Final - 04
28 pages
Uticaj elektrolita na aspirin
100% (1)
Uticaj elektrolita na aspirin
52 pages
T. Rudi Volcano Eruption Case Study
No ratings yet
T. Rudi Volcano Eruption Case Study
3 pages
Pravahika (Dysentry)
100% (1)
Pravahika (Dysentry)
20 pages
Understanding the Male Gaze
No ratings yet
Understanding the Male Gaze
22 pages
CE4810 Noise W12
No ratings yet
CE4810 Noise W12
12 pages
Skin Cancer Detection with CNNs
No ratings yet
Skin Cancer Detection with CNNs
189 pages
2016 Specimen Paper 1 PDF
No ratings yet
2016 Specimen Paper 1 PDF
16 pages
29-07-2025 CUET PG 2025 Revised 1st Round Allotment Notice & List
No ratings yet
29-07-2025 CUET PG 2025 Revised 1st Round Allotment Notice & List
29 pages
Program and Abstracts Tephinet Bi Regional
100% (1)
Program and Abstracts Tephinet Bi Regional
149 pages
Ch11 Experimental Design One Way Independent Samples Design
No ratings yet
Ch11 Experimental Design One Way Independent Samples Design
14 pages
Heat-Flex HI-TEMP 1200: Protective & Marine Coatings
No ratings yet
Heat-Flex HI-TEMP 1200: Protective & Marine Coatings
4 pages
Alcohol Industry Insights
No ratings yet
Alcohol Industry Insights
8 pages
Praseed's Phys PDF
No ratings yet
Praseed's Phys PDF
24 pages
Chapter 7 Introduction To Human Resource Management
No ratings yet
Chapter 7 Introduction To Human Resource Management
32 pages
HTML Case 1 Student Data Files
No ratings yet
HTML Case 1 Student Data Files
2 pages
Power Factor Meter
No ratings yet
Power Factor Meter
26 pages
1-10 Refs
No ratings yet
1-10 Refs
515 pages
Maintenance of Dry Type Air Cleaner
No ratings yet
Maintenance of Dry Type Air Cleaner
5 pages
Parasitology Lab Worksheet
No ratings yet
Parasitology Lab Worksheet
7 pages
Mitosis vs. Meiosis: Key Concepts
No ratings yet
Mitosis vs. Meiosis: Key Concepts
16 pages
Life Fitness
No ratings yet
Life Fitness
38 pages
Arthritis Pathophysiology Guide
100% (1)
Arthritis Pathophysiology Guide
4 pages
Pep Set Al5995 Catalyst
No ratings yet
Pep Set Al5995 Catalyst
11 pages
The Art of Cosmic Vision Practices For Improving Your Eyesight Mantak Chia - The Full Ebook Version Is Ready For Instant Download
100% (3)
The Art of Cosmic Vision Practices For Improving Your Eyesight Mantak Chia - The Full Ebook Version Is Ready For Instant Download
51 pages
PSS Weekly Learning Plan WEEK 1
No ratings yet
PSS Weekly Learning Plan WEEK 1
3 pages
Action Plan SBFP Final
100% (3)
Action Plan SBFP Final
2 pages

Slide 6 - Test Construction and Adaptation

Uploaded by

Slide 6 - Test Construction and Adaptation

Uploaded by

Test construction and

Norm-referenced Criterion referenced

⮚ How does one develop items for the item pool?

Selected response format Constructed response format

• Having created a pool of items from which the final version of

❑ The criteria for best items may differ as a function of the

❑ It provides an indication of the internal consistency

❑ It is a statistic designed to provide an indication of the

❑ Measures of Item discrimination indicate how adequately an item

• Test adaptation is a process by which a test

❑ The aim of the linguistic validation is to produce a

Steps for Standard Linguistic Validation (Mapi, 2008).

Selection and briefing

Forward translation Backward translation

Proof reading Final tool

You might also like