Introduction to SEM Using the
Partial Least Squares (PLS)
T. Ramayah
School of Management
Universiti Sains Malaysia
http://www.ramayah.com
Structural Equations Modeling
Structural Equations Modeling . . . is a family
of statistical models that seek to explain the
relationships among multiple variables.
It examines the structure of interrelationships
expressed in a series of equations, similar to a
series of multiple regression equations.
These equations depict all of the relationships
among constructs (the dependent and
independent variables) involved in the analysis.
Constructs are unobservable or latent factors
that are represented by multiple variables.
Called 2nd Generation Techniques
Distinguishing Features of SEM
Compared
to 1st Generation Techniques
It takes a confirmatory rather than
exploratory
Traditional methods incapable of either
assessing or correcting for measurement
errors
Traditional methods use observed
variables, SEM can use both unobserved
(latent) and observed variables
Testing in one complete model
Components of Error
Observed
score comprises of 3 components
(Churchill, 1979)
True
score
Random error (ex; caused by the order of items in
the questionnaire or respondent fatigue) (Heeler &
Ray, 1972)
Systematic error such as method variance (ex;
variance attributable to the measurement method
rather than the construct of interest) (Bagozzi et
al., 1991)
SEM
SEM, as a second-generation technique, allows the
simultaneous modeling of relationships among
multiple independent and dependent constructs
(Gefen, Straub, & Boudreau, 2000). Therefore, one
no longer differentiates between dependent and
independent variables but
distinguishes between the exogenous and endogenous
latent variables, the former being variables which are not
explained by the postulated model (i.e. act always as
independent variables) and the latter being variables that are
explained by the relationships contained in the model.
(Diamantopoulos, 1994, pp. 108)
Terms
o Exogenous
constructs are the latent, multi-item
equivalent of independent variables. They use a
variate (linear combination) of measures to
represent the construct, which acts as an
independent variable in the model.
o Multiple measured variables (x) represent the exogenous
constructs.
o Endogenous constructs are the latent, multi-item
equivalent to dependent variables. These constructs
are theoretically determined by factors within the model.
o Multiple measured variables (y) represent the endogenous
constructs.
Indicators
Formative
Reflective
LIFE STRESS
TIMELINESS
X1
X2
X1 = Accommodate last minute request
X2 = Punctuality in meeting deadlines
X3 = Speed of returning phone calls
X1
X3
Indicators must be highly
correlated (Hulland,
1999)
X2
X3
X1 = Job loss
X2 = Divorce
X3 = Recent accident
Indicators can have +, - or
0 correlation (Hulland,
1999)
Two approaches to SEM
Covariance
based
EQS,
AMOS, SEPATH, COSAN, LISREL
and MPLUS
Variance
based
PLS
Smart
PLS, PLS Graph
Why PLS?
Like covariance based structural equation modeling
(CBSEM), PLS is a latent variable modeling technique
that incorporates multiple dependent constructs and
explicitly recognizes measurement error (Karim, 2009)
In general, two applications of PLS are possible (Chin,
1998a): It can either be used for theory confirmation or
theory development. In the latter case, PLS is used to
develop propositions by exploring the relationships
between variables.
Reasons for using PLS
Researchers arguments for choosing PLS as the
statistical means for testing structural equation models
(Urbach & Ahleman, 2010) are as follows:
PLS makes fewer demands regarding sample size than other
methods.
PLS does not require normal-distributed input data.
PLS can be applied to complex structural equation models
with a large number of constructs.
PLS is able to handle both reflective and formative constructs.
PLS is better suited for theory development than for theory
testing.
PLS is especially useful for prediction
Choice
Overall, PLS can be an adequate alternative to CBSEM if the
problem has the following characteristics (Chin 1998b; Chin &
Newsted 1999):
The phenomenon to be investigated is relatively new and
measurement models need to be newly developed,
The structural equation model is complex with a large number of
LVs and indicator variables,
Relationships between the indicators and LVs have to be modeled
in different modes (i.e., formative and reflective measurement
models),3
The conditions relating to sample size, independence, or normal
distribution are not met, and/or
Prediction is more important than parameter estimation.
Choice
Process
Comparison
The 2 Step Approach
A structural
equation modeling process
requires two steps:
1. building and testing a measurement
model, and
2. building and testing a structural model.
The measurement model serves to create a
structural model including paths representing
the hypothesized associations among the
research constructs.
Confirmatory Factor Analysis (CFA)
Confirmatory Factor Analysis (CFA) . . . is similar to EFA in some
respects, but philosophically it is quite different.
With CFA, the researcher must specify both the number of factors
that exist within a set of variables and which factor each variable
will load highly on before results can be computed.
So the technique does not assign variables to factors.
Instead the researcher must be able to make this assignment
before any results can be obtained.
SEM is then applied to test the extent to which a researchers apriori pattern of factor loadings represents the actual data.
Review of and Contrast with Exploratory
Factor Analysis
EFA (exploratory factor analysis) explores the data and provides
the researcher with information about how many factors are
needed to best represent the data. With EFA, all measured
variables are related to every factor by a factor loading estimate.
Simple structure results when each measured variable loads
highly on only one factor and has smaller loadings on other
factors (i.e., loadings < .40).
The distinctive feature of EFA is that the factors are derived from
statistical results, not from theory, and so they can only be
named after the factor analysis is performed.
EFA can be conducted without knowing how many factors really
exist or which variables belong with which constructs. In this
respect, CFA and EFA are not the same.
Measurement Model and Construct Validity
One of the biggest advantages of CFA/SEM is its ability to
assess the construct validity of a proposed measurement
theory. Construct validity . . . is the extent to which a set
of measured items actually reflect the theoretical latent
construct they are designed to measure.
Construct validity is made up of two important components:
1.
Convergent validity three approaches:
2.
Factor loadings.
Variance extracted.
Reliability.
Discriminant validity
Average Variance Extracted (AVE)
2i = squared loadings of indicator i of a latent variable
var(i ) = squared measurement error of indicator i
AVE should exceed 0.5 to suggest adequate
convergent validity (Bagozzi & Yi, 1988).
Composite Reliability (CR)
i = loadings of indicator i of a latent variable
i = measurement error of indicator i
j = flow index across all reflective measurement model
Composite reliability should be 0.7 or higher
to indicate adequate convergence or internal
consistency (Gefen et al., 2000).
Cronbach Alpha ()
N = number of indicators assigned to the factor
2i = variance of indicator i
2t = variance of the sum of all assigned indicators scores
j = flow index across all reflective measurement model
Cronbach alpha values should be 0.7 or
higher to indicate adequate convergence or
internal consistency. (Nunnally, 1978)
Reporting Measurement Model
Model Construct
Measurement Item
Loading
Commitment
COMMIT1
0.686
COMMIT2
0.767
COMMIT3
0.885
COMMIT4
0.751
COMMUN1
0.842
COMMUN2
0.831
COMMUN3
0.829
TRUST1
0.580
TRUST2
0.587
TRUST3
0.948
PERFORM1
0.837
PERFORM2
0.900
PERFORM2
0.853
Communication
Trust
Performance
Cronbach
CRa
AVEb
0.843
0.856
0.601
0.860
0.873
0.696
0.744
0.759
0.527
0.885
0.898
0.747
Presenting Measurement Items (Table)
Model Construct
Commitment
Communication
Trust
Performance
Measurement
Item
COMMIT1
Standardized estimate
t-value
0.686
5.230**
COMMIT2
0.767
6.850**
COMMIT3
0.885
22.860**
COMMIT4
0.751
6.480**
COMMUN1
0.842
20.628**
COMMUN2
0.831
16.354**
COMMUN3
0.829
15.011**
TRUST1
0.580
1.960*
TRUST2
0.587
2.284**
TRUST3
0.948
2.640**
PERFORM1
0.837
16.081**
PERFORM2
0.900
33.456**
PERFORM2
0.853
13.924**
t-values > 1.96* (p< 0.05); t-values > 2.58** (p< 0.01)
Discriminant Validity
The square root of the Average Variance Extracted
(AVE) that exceeds the intercorrelations of the construct
with the other constructs in the model to ensure
discriminant validity (Chin, 2010; Chin 1998b; Fornell &
Larcker, 1981).
Example:
Communality and Redundancy
Construct
R2
Communality
H2
Redundancya
GOF
a
Redundancy j = communality j X R2. The redundancy index measures the quality of the structural model for each
endogenous block, taking into account the measurement model (Tenanhaus et al., 2005).
F2
Communality and Redundancy
and F2 the values should be greater than the threshold
of 0 (Fornell & Cha, 1994)
2
Q-squares Statistics
A The Q-squares statistics measure the predictive relevance of the model by
reproducing the observed values by the model itself and its parameter
estimates. A Q-square greater than 0 means that the model has predictive
relevance; whereas Q-square less than 0 mean that the model lacks
predictive relevance (Fornell & Cha, 1994).
In PLS, two kinds of Q-squares statistics are estimated, that is, crossvalidated (c-v) communality (H2j) and cross-validated redundancy (F2j).
Both statistics are obtained through blindfolding procedure in PLS. Blindfolding
procedure (while estimating Q-squares) ignores a part of the data for a
particular block during parameter estimation (a block of indicators is the set of
measures for a construct). The ignored data part is than estimated using the
estimated parameters, and the procedure is repeated until every data point has
been ignored and estimated. Omission and estimation of data point for the
blindfolded construct depend on the chosen omission distance G (Chin, 1998a).
Q-squares Statistics
A cross-validated communality H2j is obtained if prediction of the omitted data points in the
manifest variables block is made by underlying latent variable (Chin, 1998). In other words,
the cv-communality H2j measures the capacity of the path model to predict the manifest
variables (MVs) directly from their own latent variable (LV) by cross-validation. It uses only
the measurement model.
The prediction of a MV of an endogenous block is carried out using only the MVs of this
block (Tenanhaus et al., 2005). On the other hand, a cross-validated redundancy predicts
the omitted data points by constructs that are predictors of the blindfolded construct in the
PLS model (Chin, 1998a).
In other words, the cv-redundancy F2j measures the capacity of the path model to predict
the endogenous MVs indirectly from a prediction of their own LV using the related
structural relation, by crossvalidation.
The prediction of an MV of an endogenous block j is carried out using all the MVs of the
blocks j* related with the explanatory LVs of the dependent LVj (Tenanhaus et al., 2005).
This index is used for measuring the quality of the path model. In accordance to effect size
(f2), the relative impact of the structural model on the observed measures for latent
dependent variable is evaluated by means of q 2 (Henseler et al., 2009).
The q2 values of 0.02, 0.15, and 0.35 signify small, medium, and large predictive relevance
of certain latent variable, thus explaining the endogenous latent variable under evaluation
Goodness of Fit (GOF)
What
is Goodness of Fit (GOF)?
Goodness-of-fit (GoF) (Tenanhaus et al., 2005) is used to
judge the overall fit of the model.
GoF, which is the geometric mean of the average
communality (outer measurement model) and the average
R2 of endogenous latent variables, represents an index for
validating the PLS model globally, as looking for a
compromise between the performance of the measurement
and the structural model, respectively.
GoF is normed between 0 and 1, where a higher value
represents better path model estimations.
Assessing Goodness of Fit (GOF)?
Assessing
GOF
Goodness of Fit (GOF)
2
R x Average Communality
Global validation of PLS models use these cut-off
values (Wetzels et al. 2009):
GoFsmall = 0.10
GoFmedium = 0.25
GoFlarge = 0.36).
Allows to conclude that the model used has better
explaining power in comparison with the baseline
model
Assessing R2
According to Chin (1998b), R
values for
endogenous latent variables are assessed as
follows:
2
0.67
0.33
0.19
substantial
moderate
weak
Also path coefficients range between 0.20
0.30 along with measures that explain 50% or
more variance is acceptable (Chin, 1998b)
Calculating Effect Size (f2)
Effect size f
is not automatically given in
PLS, we have to do manual calculation
using the formula:
2
According to Cohen (1988), f
small
0.02
medium
0.15
large
0.35
is assessed as:
Presenting the results (Figure)
Presenting the results (Table)
The
t-values are generated using the
bootstrapping with re-samples of 200 (Chin,
1998b)
Relationship
H1
H2
H3
H4
SYS SAT
IQ SAT
SQ SAT
SAT INT
Coefficient
0.23
0.17
0.24
0.38
t-value
2.588**
1.725
2.645**
3.895**
t-values > 1.96* (p< 0.05); t-values > 2.58** (p< 0.01)
Supported
YES
NO
YES
YES
Moderator Effect Assessment
According to Cohen (1988), f
small
0.02
medium
0.15
large
0.35
is assessed as:
Mediator Effect Assessment
Can Use the Sobel (1982) test
Online:
http://www.people.ku.edu/~pre
acher/sobel/sobel.htm
VAF = Variance accounted for
Indirect effect / Total Effect
The indirect effect is VAF is converted
significant at:
0.05 if z > 1.96
0.01 f z > 2.58
into %
http://www.people.ku.edu/~preacher/sobel/sobel.htm
a and b = path coefficient
sa and sb = standard errors
ta and tb = t-values for a and b path coefficients generated from
bootstrapping
References
Hulland, J. (1999). Use of partial least squares (PLS) in strategic management research: A review
of four recent studies. Strategic Management Journal, 20, 195204.
Bagozzi, R. P., Yi, Y., & Philipps, L. W. (1991). Assessing construct validity in organizational
research. Administrative Science Quarterly, 36, 421458.
Heeler, R. M., & Ray, M. L. (1972). Measure validation in marketing. Journal of Marketing
Research, 9, 361370.
Churchill, G. A. J. (1979). A paradigm for developing better measures of marketing constructs.
Journal of Marketing Research, 16, 6473.
Diamantopoulos, A. (1994). Modelling with LISREL: A guide for the uninitiated. Journal of
Marketing Management, 10, 105136.
Gefen, D., Straub, D. W., & Boudreau, M.-C. (2000). Structural equation modeling and regression:
Guidelines for research practice. Communications of the Association for Information Systems, 4,
179.
Chin, W.W. (2000). Partial least squares for researchers: an overview and presentation of recent
advances using the PLS approach. available at: http://disc-nt.cba.uh.edu/chin/icis2000plstalk.pdf
References
Chin, W. W. (2010). How to write up and report PLS analyses. In V. Esposito Vinzi, W. W. Chin,
J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and
application (pp. 645689). New York: Springer.
Chin, W.W. (1998a). The partial least squares approach to structural equation modeling. In
G.A. Marcoulides (Ed.), Modern business research methods. Mahwah, NJ: Lawrence Erlbaum
Associates.
Chin, W.W. (1998b). Issues and opinion on structural equation modeling. MIS Quarterly, 22(1),
7-16.
Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable
variables and measurement error. Journal of Marketing Research, 18(1), 39-50.
Tenenhaus, M., Esposito Vinzi, V., Chatelin, Y.-M., & Lauro, C. (2005). PLS path modeling.
Computational Statistics & Data Analysis, 48(1), 159205.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.), Hillsdale,
Lawrence Erlbaum Associates, NJ.
Nunnally, J. C. (1978). Psychometric theory. NewYork: McGraw-Hill.