Psychological Testing
& Assessment
Chapter 4
RELIABILITY
Things to be discussed:
Define what reliability is Discuss Spearman’s early studies
Describe the conceptualization of Know the different sources of
error error
Things to be discussed:
Discuss the reliability in behavioral
studies
Explain the different methods of
estimating reliability
Answer the question “how
reliable is reliable?”
Know how to calculate the
Discuss what to do about
reliability of a test
low reliability
What is reliability?
Reliability
It is the degree to
which a measurement
produces consistent
results.
Reliability coefficient
a measure of the proportion that indicates the ratio between
the true score variance and the observed score variance.
Spearman’s Early Studies
Charles Spearman
(1863 – 1945)
Conceptualization of Error
Measurement error
refers to all the factors associated with the process of measuring some
variable, other than the variable being measured.
Conceptualization of Error
Random error Systematic error
An error caused by unpredictable An error in measuring a variable
fluctuations of other variables in that is typically constant to what
the measurement process. is presumed to be the true value
of the variable being measured.
Sources of
Error Time Sampling
Item Sampling
Test Administration
Carryover effect
• performance in one condition is
Time affected by the condition that
precedes it
Sampling
Practice effect
• improvement due to repeated
practice
Item Sampling Variation among items within a
Item Sampling test as well as to variation
among items between tests.
Item Sampling
Test Administration
Room temperature, level of
lighting, the ventilation, and
surrounding noise, etc.
• Test-taker variables
• Examiner-related variables
Methods of Estimating Reliability
Test-Retest Method
Parallel Forms Method
Split-Half Method
Test-Retest Method
Test-Retest Method
an estimate of reliability
obtained by correlating
pairs of scores from the
same people on two
different administrations
of the same test.
Parallel Forms Method
Parallel Forms Method
an estimate of reliability
that compares two
equivalent forms of a test
that measure
the same attribute.
Split-Half Method
Split-Half Method
an estimate of reliability
obtained by correlating two
pairs of scores obtained odd-even system
from equivalent halves of a
single test administered
once.
Calculating
the reliability of a test
Correlation coefficient
• a mathematical index that
describes the direction and
magnitude of a relationship.
Pearson product moment
correlation
r
bbb
Test-Retest Method
Parallel Forms Method
r
Where:
n = number of pairs of scores
∑xy = sum of the products of paired scores
∑x = sum of x scores (scores on the 1st ad)
∑y = sum of y scores (scores on the 2nd ad)
∑x 2 = sum of squared x scores
∑y2 = sum of squared y scores
X Y x2 y2 xy
(1 administration)
st
(2 administration)
nd
43 38 1,849 1,444 1,634
32 41 1,024 1,681 1,312
41 44 1,681 1,936 1,804
46 45 2,116 2,025 2,070
45 40 2,025 1,600 1,800
32 36 1,024 1,296 1,152
48 48 2,304 2,304 2,304
36 34 1,296 1,156 1,224
40 39 1,600 1,521 1,560
50 49 2,500 2,401 2,450
∑x = 413 ∑y = 414 ∑x2 = 17,419 ∑y2 = 17,364 ∑xy = 17,310
N = 10
∑x = 413 ∑y = 414 ∑x2 = 17,419 ∑y2 = 17,364 ∑xy = 17,310
r
r
r
r
r
r r
Spearman-Brown Formula
Split-Half Method
Where:
rSB = reliability of the entire test
rhh = correlation between two halves
Student Total Score Odd (x) Even (y)
1 42 20 22
2 33 15 18
3 44 21 23
4 45 24 21
5 30 16 14
6 26 11 15
7 45 24 21
8 35 18 17
9 40 23 17
10 32 16 16
11 34 15 19
12 45 22 23
N = 12
x = 225 y = 226 x= 4,413 y = 4,364 xy = 4,334
r
r
r
r
r
r
r
rSB
rSB
rSB = reliability of the entire test
rhh = correlation between two halves
rSB
rSB
KR20 Formula
Split-Half Method
Where:
KR20 = the reliability estimate (r)
N = the number of items on a test
S 2 = the variance of the total score
p = the prop of the people getting each item correct
q = the prop of the people getting each item incorrect
pq = the sum of products of p and q
for each item on a test
Polychotomous
has more than 2 possible outcomes
(Spearman-Brown Formula, Coefficient Alpha)
Dichotomous
has only 2 possible outcomes (KR20 Formula)
Math Problems
1. 5+3 2. 7+2 3. 9+1 4. 6+3 5. 8+6 6. 7+5 7. 4+7 8. 9+2 9. 8+4 10. 5+6
1 1 1 1 1 1 1 1 1 1 1
2 1 0 0 1 0 0 1 1 0 1
3 1 0 1 0 0 1 1 1 1 0
4 1 0 1 1 1 0 0 1 0 0
5 0 0 0 0 0 1 1 0 1 1
6 0 1 1 1 1 1 1 1 1 1
7 0 1 1 1 1 1 1 1 1 1
8 0 0 1 1 0 1 1 0 1 0
9 0 1 1 1 1 1 1 1 1 1
10 0 0 1 1 0 1 0 1 1 1
11 0 0 1 1 0 0 0 0 0 1
12 1 1 0 0 0 1 0 0 1 1
13 1 1 1 1 1 1 1 1 1 1
14 0 1 1 1 0 0 0 0 1 0
15 0 1 1 1 1 1 1 1 1 1
Math Problems
1. 5+3 2. 7+2 3. 9+1 4. 6+3 5. 8+6 6. 7+5 7. 4+7 8. 9+2 9. 8+4 10. 5+6
1 1 1 1 1 1 1 1 1 1 1
2 1 0 0 1 0 0 1 1 0 1
3 1 0 1 0 0 1 1 1 1 0
4 1 0 1 1 1 0 0 1 0 0
5 0 0 0 0 0 1 1 0 1 1
6 0 1 1 1 1 1 1 1 1 1
7 0 1 1 1 1 1 1 1 1 1
8 0 0 1 1 0 1 1 0 1 0
9 0 1 1 1 1 1 1 1 1 1
10 0 0 1 1 0 1 0 1 1 1
11 0 0 1 1 0 0 0 0 0 1
12 1 1 0 0 0 1 0 0 1 1
13 1 1 1 1 1 1 1 1 1 1
14 0 1 1 1 0 0 0 0 1 0
15 0 1 1 1 1 1 1 1 1 1
# of 1s 6 8 12 12 7 11 10 10 12 11
Pro. 0.40 0.53 0.80 0.80 0.47 0.73 0.67 0.67 0.80 0.73
Math Problems
1. 5+3 2. 7+2 3. 9+1 4. 6+3 5. 8+6 6. 7+5 7. 4+7 8. 9+2 9. 8+4 10. 5+6
1 1 1 1 1 1 1 1 1 1 1
2 1 0 0 1 0 0 1 1 0 1
3 1 0 1 0 0 1 1 1 1 0
4 1 0 1 1 1 0 0 1 0 0
5 0 0 0 0 0 1 1 0 1 1
6 0 1 1 1 1 1 1 1 1 1
7 0 1 1 1 1 1 1 1 1 1
8 0 0 1 1 0 1 1 0 1 0
9 0 1 1 1 1 1 1 1 1 1
10 0 0 1 1 0 1 0 1 1 1
11 0 0 1 1 0 0 0 0 0 1
12 1 1 0 0 0 1 0 0 1 1
13 1 1 1 1 1 1 1 1 1 1
14 0 1 1 1 0 0 0 0 1 0
15 0 1 1 1 1 1 1 1 1 1
# of 0s 9 7 3 3 8 4 5 5 3 4
Pro. 0.60 0.47 0.20 0.20 0.53 0.27 0.33 0.33 0.20 0.27
Math Problems
1. 5+3 2. 7+2 3. 9+1 4. 6+3 5. 8+6 6. 7+5 7. 4+7 8. 9+2 9. 8+4 10. 5+6
1 1 1 1 1 1 1 1 1 1 1
2 1 0 0 1 0 0 1 1 0 1
3 1 0 1 0 0 1 1 1 1 0
4 1 0 1 1 1 0 0 1 0 0
5 0 0 0 0 0 1 1 0 1 1
6 0 1 1 1 1 1 1 1 1 1
7 0 1 1 1 1 1 1 1 1 1
8 0 0 1 1 0 1 1 0 1 0
9 0 1 1 1 1 1 1 1 1 1
10 0 0 1 1 0 1 0 1 1 1
11 0 0 1 1 0 0 0 0 0 1
12 1 1 0 0 0 1 0 0 1 1
13 1 1 1 1 1 1 1 1 1 1
14 0 1 1 1 0 0 0 0 1 0
15 0 1 1 1 1 1 1 1 1 1
pxq 0.24 0.25 0.16 0.16 0.25 0.20 0.22 0.22 0.16 0.20
pq = 2.05
Math Problems
1. 2. 7+2 3. 4. 5. 6. 7. 4+7 8. 9. 8+4 10. Total
5+3 9+1 6+3 8+6 7+5 9+2 5+6 Score
1 1 1 1 1 1 1 1 1 1 1 10
2 1 0 0 1 0 0 1 1 0 1 5
3 1 0 1 0 0 1 1 1 1 0 6
4 1 0 1 1 1 0 0 1 0 0 5
5 0 0 0 0 0 1 1 0 1 1 4
6 0 1 1 1 1 1 1 1 1 1 9
7 0 1 1 1 1 1 1 1 1 1 9
8 0 0 1 1 0 1 1 0 1 0 5
9 0 1 1 1 1 1 1 1 1 1 9
10 0 0 1 1 0 1 0 1 1 1 6
11 0 0 1 1 0 0 0 0 0 1 3
12 1 1 0 0 0 1 0 0 1 1 5
13 1 1 1 1 1 1 1 1 1 1 10
14 0 1 1 1 0 0 0 0 1 0 4
15 0 1 1 1 1 1 1 1 1 1 9
= 99
Ẋ = 9.9
Total
Score
10
5
6
5
4
CALCULATE THE 9
VARIANCE USING
9
5 S 2 = 5.57
EXCEL: 9
6
3
5
10
4
9
KR20
KR20
KR20
KR20
KR20
Coefficient Alpha
Split-Half Method
Where:
r = the reliability estimate
N = the number of items on a test
S 2 = the variance of the total score
S 2i = the variance of the individual
items
Students Item 1 Item 2 Item 3 Total
1 6 6 8 20
2 5 5 6 16
3 9 8 6 23
4 3 2 4 9
5 2 3 2 7
6 1 1 2 4
7 5 4 6 15
Variance: S 2i = 7.29 S 2i = 5.81 S 2i = 5.14 S 2 = 48.95
S 2 = 7.29 + 5.18 + 5.14 = 18.24
i
S 2 = 48.95
S 2i = 18.24
r
r
r
r r
RELIABILITY
IN BEHAVIORAL OBSERVATION
STUDIES
inter-scorer reliability
Source of error: the degree of agreement or consistency
Observer differences between two or more observers regarding a
particular measure
Kappa statistics
How reliable is reliable?
reliability estimates in the range of .70 and .80 are
good enough for most purposes in basic research
in clinical settings, a test with a reliability of .90
might not be good enough, evaluators should
attempt to find a test with a reliability greater than .
95
What to do about low reliability?
Increase the number of items
Factor analysis
Chapter 5
VALIDITY
Things to be discussed:
Define what validity is Explain the 3 types of
evidences
Discuss the booklet Discuss the aspects of
published by a joint validity
committee
Things to be discussed:
Discuss validity coefficient Discuss the relationship
between reliability and
validity
What is validity?
Validity
the agreement between
a test score or measure
and the quality it is
believed to measure
American Educational Research Association
(AERA), American Psychological Association
(APA), and the National Council on
Measurement in Education (NCME)
Standards for Educational and
Psychological Testing
3 TYPES OF
EVIDENCES
Content-Related Evidence for Validity
Criterion-Related Evidence for Validity
Construct-Related Evidence for
Validity
Aspects of validity
Content-Related Evidence for Validity
• considers the adequacy of representation of
the conceptual domain the test is designed to
cover
Two
concepts:
Construct
underrepresentation
• describes the failure to capture important
components of a construct
Construct-irrelevant
variance
• occurs when scores are influenced by factors
irrelevant to the construct
Aspects of validity
Criterion-Related Evidence for Validity
• tells us just how well a test corresponds
with a particular criterion
Predictive validity
• is an index of the degree to which a test score predicts
some criterion measure
Concurrent validity
• is an index of the degree to which a test score is related to
some criterion measure obtained at the same time
(concurrently).
Validity coefficient
is a statistical index used to report evidence of
validity for intended interpretations of test scores
Aspects of validity
Construct-Related Evidence for
Validity
• demonstrates whether a test measures its
intended construct
Donald Campbell & John Fiske
Convergent evidence
• It is obtained when a measure correlates well with
other tests that are believed to measure the same
construct
Discriminant evidence
• It is obtained when a measure does not correlate
well with other tests that measures other construct
Relationship between
Reliability and Validity
• Reliability and validity are related concepts.
• Although different, they work together
fin