0% found this document useful (0 votes)
25 views6 pages

Variables We Wish To Study. A Variable Is Something That Can Change or Vary. For Example, A Variable of

This document provides an overview of measurement in research. It discusses conceptualization and operationalization of variables, and the four levels of measurement: nominal, ordinal, interval, and ratio. Nominal variables are categorical groups while ordinal variables have rank order. Interval and ratio variables are numerical, with interval lacking a true zero point. Scales are composite measures using multiple questions to assess a concept. Popular scales include Likert, Semantic Differential, and Guttman scales. Data can be collected through direct measures like scales or indirectly through secondary data. Using multiple methods is called triangulation. The document also discusses errors in measurement and qualitative versus quantitative research.

Uploaded by

Lisa Ferraro
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
25 views6 pages

Variables We Wish To Study. A Variable Is Something That Can Change or Vary. For Example, A Variable of

This document provides an overview of measurement in research. It discusses conceptualization and operationalization of variables, and the four levels of measurement: nominal, ordinal, interval, and ratio. Nominal variables are categorical groups while ordinal variables have rank order. Interval and ratio variables are numerical, with interval lacking a true zero point. Scales are composite measures using multiple questions to assess a concept. Popular scales include Likert, Semantic Differential, and Guttman scales. Data can be collected through direct measures like scales or indirectly through secondary data. Using multiple methods is called triangulation. The document also discusses errors in measurement and qualitative versus quantitative research.

Uploaded by

Lisa Ferraro
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 6

Chapter 4 Lecture Notes Part I

Hello and welcome to Part I - a review of Measurement in research.

When we conduct research, we use conceptualization to identify which are the most important
variables we wish to study. A variable is something that can change or vary. For example, a variable of
interest might be trauma in veterans. Conceptualization involves stating exactly what is meant by each
a variable. In other words, conceptualization means that we define the variable. Many concepts may be
abstract; therefore, conceptualization involves making abstract concepts specific and precise. For
example, being diagnosed with posttraumatic stress disorder might be a way we identify stress in
veterans.

When we operationalize a variable, we specify how we’re going to measure the variable. For example,
we might measure the severity of posttraumatic stress disorder by using an inventory or scale designed
to measure trauma symptoms.

One very important thing to know when conducting research is that variables have characteristics. A
variable’s level of measurement defines the nature of that variable and determines what statistical
methods can be used to analyze the variable. There are four levels of measurement: nominal, ordinal,
interval, and ratio. Let’s review each of these. Nominal level variables are the lowest level of
measurement. These are categorical variables that fall into groups such as gender, race, or political
party. The next measurement level is ordinal. Ordinal level variables are also categorical variables that
have some order or rank from low to high, or small to large. For example, class rank such as being a
freshman, sophomore, junior, or senior, would be an example of an ordinal level variable. Another
example would be level of satisfaction that are often measured on surveys. For example, you may
respond to a customer satisfaction survey that asks whether you were totally unsatisfied, somewhat
satisfied, or were totally satisfied.

The next level measurement is interval. Interval level variables are continuous, which means they are
numerical. Interval level variables are equally spaced on a continuum. Temperature is an example of an
interval level variable. The degrees on a thermometer are evenly spaced. There is the same distance
between 1° and 2° as there is between 99° and 100°. The last and highest level of measurement is the
ratio level of measurement. Ratio level variables are continuous. This means they are numerical values
that indicate the actual amount of the property being measured. For example, age, height, and weight
are all continuous ratio level variables. There is sometimes confusion distinguishing between interval
level and ratio level variables. Let’s review some examples.

Nominal level variables are fairly easy to recognize. They are variables that fall into categories and
cannot be ranked-ordered. For example, if we had a category called animals, it could include items such
as cats, dogs, cows, or pigs. These are clearly not numerical, and cannot be rank-ordered.

As mentioned ordinal level variables are also categorical variables, but they are rank-ordered. We see
ordinal level variables most frequently in surveys that ask respondents to select some type of ordered
response. For instance, from low to high or small to large.

Some examples of interval level variables are time, IQ score, and temperature. The difference between
an interval level variable and a ratio level variable, is that interval level variables do not have a true zero
point. For example, the time on a clock would be considered an interval level variable because there is
no true zero value. It begins at 12:00 AM and ends at 12:00 PM. Each unit (seconds, hours, minutes) is
evenly spaced. For example, the distance between 1:00 PM, and 1:01 PM is 1 minute. Likewise, the
distance between 6:05 PM and 6:06 PM is also 1 minute. Each time unit is evenly spaced. Temperature
is also considered an interval level variable because there is no true zero value. Of course, we could
have a temperature that is 0° Fahrenheit, but 0° F is not the same as 0° Celsius. Therefore, there is no
true zero value on a thermometer. IQ is also an interval level variable because there is no true zero
value.

Like interval level variables, ratio level variables also have evenly spaced units. However, ratio level
variables have a true zero value. For example, number of children would be considered a ratio level
variable, because it is possible to have 0 children. Likewise, amount of money earned per year, or height
measured in inches would also be considered ratio level variables.

Table 1.1 shows the characteristics of the four levels of measurement. All four levels of measurement
can have more than one attribute or value. If the attributes or values are rank-ordered, then the
variable cannot be nominal. If there is equal distance between each of the attributes, then the variable
cannot be nominal or ordinal. And if the variable has a true 0 point, it can only be a ratio level variable.
Although it is important to know the difference between each of the levels of measurement, for the
purposes of statistical analysis interval and ratio level variables are often treated similarly.

There are many ways to describe variables. One type of variable is called a dichotomous variable. A
dichotomous variable is a specific type of variable that has only two attributes. For example, if you were
taking a true or false test, the response would be a dichotomous because you can answer only true or
false. There is no third choice. Another name for a dichotomous variable is a binary variable. Please
note that in datasets dichotomous variables are sometimes represented numerically using 1s and 0s.
For example, in a dataset of responses from a true/false test, true answers might be coded using 1s,
while false responses might be coded using 0s. When we use 1s and 0s to replace categories in binary
variables, we call this dummy coding or refer to these variables as dummy variables.

Scales are composite or combined measures based on the sum or average of responses. Using scales
creates a more complete measure of the concept than any single component or question. For example,
the Beck Depression Inventory contains 21 items related to depression. These 21 items together give a
better clinical picture than any one item on the inventory.

When using scales, be aware that each question on the scale may not measure the same concept, and
that there may be groups or clusters of questions within the scale that target different sub-concepts.
Also, some scales have weighted questions, which means that when a final score is compiled, some
questions count more toward the final score than other questions.

There are several popular scales use in research. The first is a Likert scale. Likert scales measure the
degree to which a participant has a particular attitude or feeling. For example, customer satisfaction
surveys may include Likert items, such as “How satisfied were you with your shopping experience?” The
response choices could include “very satisfied,” “somewhat satisfied,” neither satisfied nor dissatisfied,”
somewhat dissatisfied,” and very dissatisfied.” Please remember that a scale is a collection of questions,
and an item is a single question. A Likert scale is not the same as a Likert-type item. Single items are not
scales.
Another popular scale is called a Semantic Differential scale. This type of scale has a person select where
they rate themselves between two opposite words. For example, a person may be asked to rate how
they found the cleanliness of a department store. On one side of the scale would be the word “clean,”
and on the other side of the scale would be the word “dirty.”

Another type of scale a Guttman scale. This type of scale will capture different levels of a concept, so it is
a more complex type of scale. The Beck Depression Inventory is an example of a Guttman scale.

When including treatment or an intervention as a variable, you must provide an explicit and detailed
description of the treatment. There must be a summary of the how the treatment or intervention is
operationalized.

There are different ways to collect data. We can use direct measures, inference measures – for example,
by administering a scale or inventory to a person. We could use secondary data, which is data that have
already been collected by another researcher. Researchers can collect primary data, which is data the
researchers gather themselves. Data can also be collected using unobtrusive measures, such as
obtaining data from websites, newspapers, archives, etc.

In some cases, a researcher may use more than one measure. This is called triangulation. Triangulation
means that the researcher is collecting data in different ways to help validate the results. For example, a
researcher might administer a depression inventory AND conduct a personal interview with the person
to ask about their depression.

In qualitative research, the researcher uses induction to arrive at conclusions about what the data mean.
Whereas in quantitative research, theory guides the research, in qualitative research the study is used to
create a theory. Here is a summary of the differences between qualitative and quantitative research.

Chapter 4 Lecture Notes Part II

Hello and welcome to Part II – a review of Measurement. No measurement instrument is perfect – we


can always anticipate some error when collecting data. Researchers try to account for error and develop
their methods to minimize errors. Systematic errors are defined as errors that occur due to the fault of
the measuring device. In other words, systematic errors are related to the defect in the device used or
imperfect experimental design or the way an instrument is being administered. These types of errors
occur when the measuring devices are not used correctly.

Systematic errors include social desirability, which is when a participant responds to questions in a way
that is socially acceptable. For example, if a researcher asks, “have you ever used street drugs?” a
participant may respond no, even if the participant did use street drugs. Another type of systematic
error is acquiescence bias. Acquiescence bias occurs when a participant is too agreeable – the
participant agrees to statement regardless of their content. Some researchers think this type of bias is
due to politeness because people tend to agree (to be polite) unless they have very strong negative
feelings about a topic. It also happens when a person defers to a higher authority — like an interviewer,
who the interviewee thinks is more knowledgeable or intelligent, or is of a higher social class. Finally,
respondents may just want to “get it over with” and answer “agree” to speed up the process. Faced with
a lengthy quiz they don’t want to take, people will check “agree” down the block without even fully
reading the questions.

Another type of bias occurs with leading questions. Leading questions are worded in a way that’ll sway
the participant to one side of an argument. Usually you can tell a question is leading if it includes non-
neutral wording. For example, “should concerned social workers challenge unfair rules?” is an example
of a leading questions. By using the word concerned, you put social workers who don’t challenge rules
on the defensive, thus creating bias. Instead, ask it this way: Do you think social workers should
challenge rules?

There can also be systematic errors that occur because of subgroup response bias related to gender,
age, or race.

When we conduct a study, we need to consider measurement issues. Two measurement issues are
reliability and validity. Reliability is the degree of consistency in the way we measure variables. Types of
reliability include test-retest reliability, alternate-forms reliability, inter-rater reliability, and intra-rater
reliability. Let’s review each of these types of reliability.

Test-retest reliability is the degree to which a measurement tool consistently produces the same result
over time, providing all other variables remain the same. For example, if I gave a person the Beck
Depression Inventory twice in one day – in the morning and at night, I would expect the results to be the
same as long as nothing happened during the day that would affect the results. Statistics that measure
test-retest reliability produce scores to evaluate reliability. A score of 0.7 is considered strong, and a
score of 0.8 or higher is very strong.

With test-retest reliability we assume that nothing else is affecting the variables of interest between the
times we administer a test, and there is no testing effect on participants – in other words, the
environment has not changed.

Internal consistency is a measure of how much the items on a measurement instrument are associated.
If we use a tool to assess anxiety, we would like to know if the items consistently measure anxiety. There
are a few ways to test internal consistency. We could use split-half reliability, which is when the items
on an instrument are divided in half and then administered to two separate, comparable groups to find
out if both halves produce similar results. The comparison can be done using correlation or another
statistic called Cronbach’s alpha.

Alternate forms reliability refers to how are the responses to one set of questions are similar to
responses to another set of similar questions. This type of reliability is also called parallel forms
reliability or equivalent forms reliability. This type of reliability involves dividing one large set of
questions into two alternate sets (“forms”), where both sets contain questions that measure the same
construct, knowledge or skill. The two sets of questions are given to the same sample of people within a
short period of time and an estimate of reliability is calculated from the two sets.

Interrater reliability is the level of agreement between different raters. Psychiatric diagnosis is a good
example of a measure that has rather poor interrater reliability. Three independent clinicians may give
the same person three different diagnoses.
By comparison, intra-rater reliability is when a single rater assesses a person at two different points in
time. High intra-rater reliability means that the rater would score the person similarly at both times.

Validity determines how well the instrument measures what it is supposed to measure. For example, if
an inventory or scale is supposed the measure anxiety, it should measure anxiety and not depression or
any other type of mental health symptom. In other words, if an instrument or scale has high validity, we
can be confident that it measures what we want it to measure. Different types of validity include face
validity, content validity, criterion validity, and construct validity. Let’s review each of these terms.

The simplest and least robust type of validity is face validity. Face validity means that the instrument
looks like it measures what it is supposed to measure. Content validity refers to the instrument
measuring all dimensions of a construct according to experts and the literature. For example, if an
instrument was supposed to measure emotional maturity, it would need to be sensitive to all the
components of emotional maturity, such as maintaining an intimate relationship, longevity at a place of
employment, etc.

Criterion validity has two subtypes – concurrent and predictive. Concurrent criterion validity means that
an instrument that measures a construct is comparable to another validated instrument that measure
the same construct. Predictive validity is when an instrument can predict something that will occur in
the future – for example, job performance.

Construct validity depends on how well an instrument aligns with past research and current theory.
Discriminant validity tests whether constructs that are not supposed to be related to one another,
actually are not related. Suppose we want to study depression in older adults. To measure depression
(the construct), you use two measurements: a survey and participant observation. If the scores from
your two measurements are close enough (i.e. they converge), this demonstrates that they are
measuring the same construct. If they don’t converge, this could indicate they are measuring different
constructs (for example, anger and depression or self-worth and depression) and therefore the two
measures are discriminant.

Known groups validity is the type of evidence generated by comparing known groups on the survey
outcomes and detecting the differences between these known independent groups. For example, a
group of individuals known to be not depressed should have lower scores on a depression scale than the
group known to be depressed. Factorial validity is used to examine the interrelationships among the
items in an instrument and to identify subsets of items that can be statistically grouped together. For
example, the Strengths and Difficulties Questionnaire is a diagnostic screening tool for children. The
items on the SDQ cover 5 factors: emotional symptoms, conduct problems, hyperactivity/inattention,
peer relationship problems, and prosocial behavior. When these 5 factors are combined into an overall
score, the instrument yields a measure of the child’s strengths and difficulties.

Some ways that reliability and validity can be improved are by conducting a focus group with potential
respondents to get feedback on the survey questions. A researcher could conduct cognitive interviews,
which involve have a person complete the survey with the researcher, who asks questions about what
the respondent is thinking about the survey items while taking the survey. A researcher could also
audiotape test interview during the pre-test phase of a survey and then analyzed the audiotapes to
uncover patterns or issues that need to be addressed to refine the survey.

You might also like