Session 7
MULTIVARIATE DATA ANALYSIS
Contents…
1. Introduction to multivariate analysis
2. Dependence methods
3. Interdependence methods
12/08/21 1
I. INTRODUCTION
Involves the simultaneous analysis of >2 variables
Advanced statistical techniques
Powerful in solving complex research problems
“These techniques are extremely dangerous when being
used by unskilled people” because “there are a number of
problems and statistical assumptions related to each
technique”.
(Kinnear & Taylor, 1987, p.525).
12/08/21 2
Dependence methods: One or more variables have
been designated as being predicted by a set of
independent variables.
Multiple regression, ANOVA, Conjoint analysis,
Discriminant analysis, Structural Equation Modeling...
Interdependence methods: No variable(s) are
designated as being predicted by others. It is the
interrelationship among all the variables taken together
that interests the researcher.
Factor analaysis, Cluster, Multidimensional Scaling.
12/08/21 3
II. DEPENDENCE METHODS
Scale requirement
Method Required scale of variable(s)
Dependent Independent
One dependent variable
Multiple regression Interval interval
ANOVA Interval Nominal
Multiple regression with Interval Nominal
dummy variable
Discriminant analysis Nominal Interval
Conjoint analysis Ordinal Nominal
Two or more dependent variables
Canonical analysis Interval Interval
MANOVA Interval Nominal
Network structure including many dependent and independent variables
12/08/21 4
SEM Interval Interval
II.1 Multiple Regression
Y = a1X1 + a2X2 + a3X3 + ... anXn + b
One DV, two or more IDVs
All are intervally scaled variables (except dummy variable)
Three key results to analyze:
The fitness of the multiple regression equation:
represented by r2 = 0 1 (coefficient of determination)
% of variation of Y explained by the regression.
Test of the significance level of r2: Use F – test (sig. )
Test of the significance level of each regression coeficient
(a1, a2, a3,…) : Use t – test (sig.)
12/08/21
(SPSS provides all sig. levels) 5
Assumptions in multiple regression
a. Linearity: relationships between DV and IDVs are linear.
Test by observing the scatter diagram or correlation matrix
b. Multicolinearity: No linear correlation among IDVs.
Test by investigating “Tolerance” or VIF
c. Normality of all variables and of all residuals
d. Constant variance of the error term (Homoscedasticity)
e. Independence of the Error Terms
12/08/21 6
Notes when using multiple regression:
Applicable when there exist linear correlations among
variables.
Do not prove causal relationship.
Can be used for Prediction or Explanation
There should be more than 10 observations for one IDV
( requird sample size)
If IDV is nominally scaled, dummy variable regression can
be employed
12/08/21 7
Example:
Identifying the determinants of employee satisfaction in XYZ Co.
DV: Employee satisfaction.
IDVs: Rewards, Working condition, Recognition by managers,
Peer relationship, Promotion Opport., Development Opport.
IDVs Unstandardized Standardi t Sig. Collinearity
Coefficients zed Statistics
Coefficie
nts
B Std. Beta Tole VIF
Error ance
(Constant) 0.540 0.193 2.793 .007
Rewards 0.526 0.081 0.596 6.491 .000 .793 1.062
Recognition 0.205 0.061 0.310 3.380 .001 .793 1.262
12/08/21 8
r2=0.619 F sig. = 0.000
II.2. ANOVA – ANALYSIS OF VARIANCE
Non-metric IDVs and metric DV
Used to compare means of DV under the impact of one or
more IDVs.
Can be used with more than one IDV (factorial ANOVA).
Principle: “between-group variance > within-group
variance” significant differences in the means of groups
Family: ANCOVA / MANOVA / MANCOVA
12/08/21 9
Example of ANOVA:
A survey of 200 companies in garment, cosmetic
and plastic industries about their average expenses
for sales promotion during the last three years.
The researcher wants to explore whether there are
significant differences in the average expenses for
sales promotion among these three industries
12/08/21 10
Company No. Industry SP expenses
(1000 USD)
1 Garment 123
2 Garment 235
3 Cosmetic 1346
4 Plastic 876
.. ..
199 Plastic 68
200 Garment 12
IDVs: Industry(nominal) (3 treatments)
DV: Sales Promotion expenses (ratio)
12/08/21 11
Possible method: compare the mean values of DV for
each pair of industries (using t – test).
However, when the No. of treatments increases the
comparisons become arduous.
In such a situation, ANOVA is the better method:
H0 : 1 = 2 = ... = k =
Ha : at least one i which is significantly different from the
others.
Where = population mean
12/08/21 12
II.3. DISCRIMINANT ANALYSIS
Purpose: to identify the linear combination of IDVs that is
best discriminate among the prespecified groups that are
formed on the basis of a DV.
Metric IDVs, Nominal DV.
Outcomes: A linear combination:
Y = v1.X1 + v2.X2 + v3.X3 + …and critical score Ycri
For a particular subject:
Calculate its Y score,
Compare Y Ycri
predict which group the subject belongs.
12/08/21 13
Example
An IT trading company wants to know whether family income
(X1) householder’s education (X2) are useful to discriminate
between PC buyers and non-PC buyers.
Conduct a survey of n households (with / without a PC).
IDVs: X1 – income, X2 – education : metric variables
DV: with a PC, without a PC: categorical variable
Analysis results: discriminant function Y= v1X1 + v2X2
v1 , v 2 : discriminant coefficients
Ycri : critical score
Given a household i (X1i and X2i ) we can predict whether it is
a (potential) buyer.
12/08/21 14
II.4. CONJOINT ANALYSIS
Derivingthe utility values attached to various attributes of an
object based on respondents’ overall preferences for different
bundles of attributes/ profiles of the object.
Nonmetric IDVs - Ordinal DV
The researcher designs a number of test alternatives. Each
alternative represents a combination of treatments.
Respondents are asked to rank the alternatives according to
their preference.
Conjoint
analysis derives the utility score for each attribute
representing their relative importance to the overall preference.
To test a new product with various attributes (quality, packaging,
price...). Each has some treatments (high/medium/low).
To estimate the market shares of different brands.
To segment the market/ Categorize study subjects
12/08/21 15
Example
Test a new product with 3 attributes:
Price: (high, medium, low)
Package size: (small, medium, large)
Features: (simple, complex)
Form 8 test alternatives (instead of 18 combinations).
Ask respondents to rank order
Results:
contribution of each attribute to overall preference
preference of each treatment in an attribute.
identify the most preferred combination.
12/08/21 16
II.5. Structural Equation Modeling - SEM
CUSTOMER MANAGEMENT
ORIENTATION COMPETENCIES
COMPETITOR .80 .73 .33
ORIENTATION
.62
FUNTIONAL .83 MARKET .34 BUSINESS
COORDINATION ORIENTATION PERFORMANCE
.59
PROFIT .83
ORIENTATION
RESPONSIVENESS
III. INTERDEPENDENCE METHODS
III.1. Factor Analysis
For data / variable reduction by grouping them into
representing factors.
Metric variables
Application:
Developing multi-item scale
Explore the pattern of a data set
Reduce the dimensions in a data set
12/08/21 18
Example:
Case X1 X2 X3 …. …. Xm
1
2
3
…
n
Factor analysis: grouping m variables into k factors
Factor 1 includes X1 X6 X9 Xm
Factor 2 includes X2 X3 X10 Xm - 1
Factor 3 includes X4 X5 X7 X8 ...
Exploratory factor analysis (EFA)
Confirmatory factor analysis (CFA).
12/08/21 19
III.2. Cluster analysis
Segmenting objects into homogeneous groups, given
data for the objects on a variety of characteristics.
Ex: Market Segmentation
Buying behavior Typology
Procedure:
- Identify variables / characteristics for for grouping
- Segmenting based on similarities - distances.
- Labeling clusters based on their shared charateristcs.
- Validation and profiling
12/08/21 20
Example: Segmenting the detergent market
Metric Scales
Based on consumer buying behaviors.
“Please indicate the importance level (from 1 for very important
to 5 for not important at all) of the following factors when you
consider buying detergent powder”
X1 – Product quality ____
X2 – Price ____
X3 – Convenience ____
X4 – Known brand ____
12/08/21
X5 – Sales promotion ____ 21
300 consumers have been surveyed
The data are cluster analyzed to identify different clusters.
Customers within each cluster have similar perception on
the importance of (X1 X5) on their buying decision.
Results:
Cluster 1 – (Young, urban, medium/high income)
X1, X4, X5 are important;
Cluster 2 – (Industrial / business customers)
X1, X2, X3 are important.
From these findings, the company will develop its
targeting strategy and business plan.
III.3. Multidimensional scaling (perceptual mapping)
Inferring the number / nature of dimensions underlying
respondent perceptions based on their judgements about
objects (brands, products, companies, localities, etc.)
Metric / nonmetric scale
Identifying the relative positions (on a map) of competitive
brands based on several dimensions.
12/08/21 23
Example: MDS result for TV brands in HCMC
12/08/21 24
PRACTICE PROJECT
Work on your data set to answer the research objectives:
Determinants of learning effectiveness?
How to improve learning effectiveness?
Simplified procedure:
For each concept in the model, pick up one representative
variable
Run multiple regression with “learning effectivenss” as
dependent variable and others as independent variables
Interpret the results to answer the research objectives
12/08/21 25
PRACTICE PROJECT
A better procedure:
Assess and refine the scales by using Factor
analysis and Reliability assessment
Calculate factor scores using the qualified variables
Multiple regression
Interpret the results
12/08/21 26
The SEM result
teaching .68
method
assessment
.50
.29
instructor .12
devotion
reference .14
.58
readability
.42 .32
.34
learning
.29
.24 effectiveness
learning
motive
.13
.16
active
participation
78% of the variation of LEARNING EFFECTIVENESS
can be explained by this model
END SESSION 7
12/08/21 28