Psychometric Testing
Psychometric Testing
In probability theory and statistics, variance is the expectation of the squared deviation of a
random variable from its mean. Informally, it measures how far a set of (random) numbers
are spread out from their average value.
28th august
Brackets for intelligence
Wechsler (WAIS–III) 1997 IQ test classification
IQ Range ("deviation IQ") IQ Classification
130 and above Very superior
120–129 Superior
110–119 High average
90–109 Average
80–89 Low average
70–79 Borderline
69 and below Extremely low
11/9 hw
Cross validation and publishing a test (pg 128 to 130 in psychological testing textbook)
Cross validation refers to the practice of using original regression equation in a new sample
to determine whether the test predicts the criterion as well as it did in the original sample.
Binet Analysis test
The Binet-Simon Scale was developed by Alfred Binet and his student Theodore Simon. The
Stanford-Binet test is meant to gauge and analyze intelligence through five factors of
cognitive ability. These five factors include fluid reasoning, knowledge, quantitative
reasoning, visual-spatial processing and working memory. Both verbal and nonverbal
responses are measured. Each of the five factors is given a weight and the combined score is
often reduced to a ratio known commonly as the intelligence quotient, or IQ. The Stanford-
Binet test is the reason we have the IQ scale we are most familiar with today, and the one
most high-IQ societies base their admission threshold by. The test is among the most
reliable standardized tests currently used in education. It has undergone many validity tests
and revisions throughout its century-long history.
Measurement is the assignment of scores to individuals so that the scores represent some
characteristic of the individuals. Psychological measurement is often referred to as
psychometrics. imagine a clinical psychologist who is interested in how depressed a person
is. He administers the Beck Depression Inventory, which is a 21-item self-report
questionnaire in which the person rates the extent to which he or she has felt sad, lost
energy, and experienced other symptoms of depression over the past 2 weeks. The sum of
these 21 ratings is the score and represents his or her current level of depression. The
important point here is that measurement does not require any particular instruments or
procedures. It requires some systematic procedure for assigning scores to individuals or
objects so that those scores represent the characteristic of interest. Many variables studied
by psychologists are straightforward and simple to measure. These include sex, age, height,
weight, and birth order. Other variables studied by psychologists—perhaps the majority—
are not so straightforward or simple to measure. We cannot accurately assess people’s level
of intelligence by looking at them, and we certainly cannot put their self-esteem on a
bathroom scale. These kinds of variables are called constructs (pronounced CON-structs)
and include personality traits (e.g., extroversion), emotional states (e.g., fear), attitudes
(e.g., toward taxes), and abilities (e.g., athleticism). Psychological constructs cannot be
observed directly. One reason is that they often represent tendencies to think, feel, or act in
certain ways. For example, to say that a particular college student is highly extroverted (see
Note 5.6 “The Big Five”) does not necessarily mean that she is behaving in an extroverted
way right now.
2. The ordinal level of measurement involves assigning scores so that they represent
the rank order of the individuals. Ranks communicate not only whether any two
individuals are the same or different in terms of the variable being measured but
also whether one individual is higher or lower on that variable.
3. The interval level of measurement involves assigning scores so that they represent
the precise magnitude of the difference between individuals, but a score of zero
does not actually represent the complete absence of the characteristic. A classic
example is the measurement of heat using the Celsius or Fahrenheit scale. The
difference between temperatures of 20°C and 25°C is precisely 5°, but a temperature
of 0°C does not mean that there is a complete absence of heat. In psychology, the
intelligence quotient (IQ) is often considered to be measured at the interval level.
4. The ratio level of measurement involves assigning scores in such a way that there is a
true zero point that represents the complete absence of the quantity. Height
measured in meters and weight measured in kilograms are good examples. So are
counts of discrete objects or events such as the number of siblings one has or the
number of questions a student answers correctly on an exam.
Indirect methods typically involve the use of a projective test. A projective test is involves
presenting a person with an ambiguous (i.e. unclear) or incomplete stimulus (e.g. picture or
words). The stimulus requires interpretation from the person. Therefore, the person’s
attitude is inferred from their interpretation of the ambiguous or incomplete stimulus. The
assumption about these measures of attitudes it that the person will “project” his or her
views, opinions or attitudes into the ambiguous situation, thus revealing the attitudes the
person holds. However, indirect methods only provide general information and do not offer
a precise measurement of attitude strength since it is qualitative rather than quantitative.
This method of attitude measurement is not objective or scientific which is a big criticism.
Examples are Rorschach Inkblot Test and Thematic Apperception Test (or TAT)
13/9 -Types of Scales
1. Dichotomous Scales- The dichotomous question is a question which can have two
possible answers. Dichotomous questions are usually used in a survey that asks for a
Yes/No, True/False or Agree/Disagree answers. They are used for clear distinction of
qualities, experiences or respondent’s opinions. Dichotomous questions (Yes/No) may seem
simple, but they have few problems both on the part of the survey respondent and in terms
of analysis. Yes/No questions often force customers to choose between options that may
not be that simple, and may lead to a subject deciding on an option that doesn’t truly
capture their feelings.
Eg- Myers-Briggs Type Indicator (MBTI). MBTI reports tell you your preference for each of
four pairs: Extraversion or Introversion E or I. Sensing or Intuition S or N. Thinking or Feeling
T or F. Binary variables are a sub-type of dichotomous variable; variables assigned either a 0
or a 1 are said to be in a binary state. For example Male (0) and female (1). Dichotomous
variables can be further described as either a discrete dichotomous variable or a continuous
dichotomous variable. The idea is very similar to regular discrete variables and continuous
variables. When two dichotomous variables are discrete, there’s nothing in between them
and when they are continuous, there are possibilities in between. “Dead or Alive” is a
discrete dichotomous variable. You can only be dead. Or you can only be alive. “Passing or
Failing an Exam” is a continuous dichotomous variable. Grades on a test can range from 0 to
100% with every possible percentage in between. You could get 74% and pass. You could
get 69% and fail. Or a 69.5% and pass (if your professor rounds up!).
2 Ipsative scale- is a descriptor used in psychology to indicate a specific type of measure in
which respondents compare two or more desirable options and pick the one that is most
preferred (sometimes called a "forced choice" scale). An ipsative measurement presents
respondents with options of equal desirability; thus, the responses are less likely to be
confounded by social desirability. Respondents are forced to choose one option that is
“most true” of them and choose another one that is “least true” of them. A major
underlying assumption is that when respondents are forced to choose among four equally
desirable options, the one option that is most true of them will tend to be perceived as
more positive. Similarly, when forced to choose one that is least true of them, those to
whom one of the options is less applicable will tend to perceive it as less positive. For
example, consider the following: ipsative forms give the applicant a choice of 2-4 equally
positive statements, and they must give their preference or agreement to one of them. An
example being to choose from: “I enjoy social events” or “I like to keep organised”. This
forces the person think more about their answer, and hopefully answer more truthfully, as
there is not one obviously desirable quality to pick from.
The measurement dependency violates one of the basic assumptions of classical test theory
—independence of error variance—which has implications for the statistical analysis of
ipsative scores, as well as for their interpretation.
3. Q sort/rank order- The Q-Sort Scaling is a Rank order scaling technique where the
respondents are asked to sort the presented objects into piles based on similarity according
to a specified criterion such as preference, attitude, perception, etc. In other words, a
scaling technique in which the respondents sort the number of statements or attitudes into
piles, usually of 11, on the basis of some specified criterion. For example, suppose the
respondents are given 100 motivational statements on individual cards and are asked to
place these in 11 piles, ranging from the “most agreed with” to the “least agreed with”.
Generally, the most agreed statement is placed on the top while the least agreed statement
at the bottom.
Test-Rest Reliability
Alternate-forms reliability
Inter item consistency- The degree to which every test item is measures the same construct.
Split-Half reliability
Coefficient Alpha
Kuder-Richardson
Interscorer Reliability
Factors influencing the reliability of test scores:
There are some intrinsic and extrinsic factors which affect the reliability of the test scores:
1. Length of the test: The reliability of the test increases with its length.
2. Speed: In a speed test, reliability will be problematic. This is because every student
cannot complete all of the items is a speed test. In contrast, a power test is a test in
which every student is able to complete all the items.
3. Group Homogeneity: The test is more reliable if the group of students on which the
test is administered is more heterogeneous.
4. Item Difficulty: The test items should have certain difficulty level so as to maintain
the reliability of the test i.e the items of the test should not be very easy or very
hard.
5. Objectivity: Objective test will have higher reliability compared to subjective test.
6. Variation with the testing situation: Deviation during the administration of the test
such as noise level and distraction can cause test scores to vary, which may affect
the reliability of the test.
1. Objectivity: The test should be free from subjective—judgement regarding the ability,
skill, knowledge, trait or potentiality to be measured and evaluated.
2. Reliability: This refers to the extent to which they obtained results are consistent or
reliable. When the test is administered on the same sample for more than once with a
reasonable gap of time, a reliable test will yield same scores. It means the test is
trustworthy. There are many methods of testing reliability of a test.
3. Validity: It refers to extent to which the test measures what it intends to measure. For
example, when an intelligent test is developed to assess the level of intelligence, it should
assess the intelligence of the person, not other factors. Validity explains us whether the test
fulfils the objective of its development. There are many methods to assess validity of a test.
4. Norms: Norms refer to the average performance of a representative sample on a given
test. It gives a picture of average standard of a particular sample in a particular aspect.
Norms are the standard scores, developed by the person who develops test. The future
users of the test can compare their scores with norms to know the level of their sample.
5. Practicability: The test must be practicable in- time required for completion, the length,
number of items or questions, scoring, etc. The test should not be too lengthy and difficult
to answer as well as scoring.