0% found this document useful (0 votes)
11 views

Lecture_note5

The document discusses Principal Component Analysis (PCA), a multivariate method for explaining the variance-covariance structure of a set of variables through linear combinations. It highlights the objectives of PCA, including data reduction and interpretation, and details the mathematical formulation of principal components based on covariance matrices. Additionally, it addresses the significance of eigenvalues and eigenvectors in determining the principal components and their applications in summarizing sample variation.

Uploaded by

Nasir
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture_note5

The document discusses Principal Component Analysis (PCA), a multivariate method for explaining the variance-covariance structure of a set of variables through linear combinations. It highlights the objectives of PCA, including data reduction and interpretation, and details the mathematical formulation of principal components based on covariance matrices. Additionally, it addresses the significance of eigenvalues and eigenvectors in determining the principal components and their applications in summarizing sample variation.

Uploaded by

Nasir
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

5.

Multivariate Methods by Projection


5.1 Principal Component Analysis
A principle component analysis is concerned with explaining the variance-
covariance structure of a set of variables through a few linear combinations of
these variables. It general objectives are
(1) data reduction
Although p components are required to reproduce the total system variability,
often much of this variability can be accounted for by a small number k of
the principle components.
(2) interpretation
An analysis of principle components often reveals relationships that were
not previously suspected and thereby allows interpretations that would not
ordinarily results.

1
5.1.1 Population Principle Components

• Algebraically, principal components are particular linear combinations of the


p random variables X1, X2, . . . , Xp.

• Geometrically, these linear combination represent the selection of a


new coordinate system obtained by rotating the original system with
X1, X2, . . . , Xp as the coordinate axes. The new axes represents the
directions with maximum variability and provide a simpler and more
parsimonious description of the covariance structure.

• Principal components depend solely on the covariance matrix Σ (or the


correlation matrix ρ) of X1, X2, . . . , Xp. Their development does not require
a multivariate normal assumption. On the other hand, principal components
derived for multivariate normal populations have useful interpretations in
terms of the constant density ellipsoids.

2
let the random vector X 0 = [X1, X2, . . . , Xp] have the covariance matrix Σ
with eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λp ≥ 0.

Consider the linear combinations

Y1 = a01X = a11X1 + a12X2 + · · · + a1pXp


Y2 = a02X = a21X1 + a22X2 + · · · + a2pXp
..

Yp = a0pX = ap1X1 + ap2X2 + · · · + appXp

Then

Var(Yi) = a0iΣai i = 1, 2, . . . , p
Cov(Yi, Yk ) = a0iΣak i, k = 1, 2, . . . , p

3
Define
First principle component = linear combination a01X that maximizes
Var(a01X) subject to a01a1 = 1
Second principle component = linear combination a02X that maximizes
Var(a02X) subject to a02a2 = 1 and
Cov(a01X, a02X) = 0
At the ith step,
ith principle component = linear combination a0iX that maximizes
Var(a0iX) subject to a0iai = 1 and
Cov(a0iX, a0k X) = 0 for k < i

4
Results 5.1 Let Σ be the covariance matrix associated with the random
vector X 0 = [X1, X2, . . . , Xp]. Let Σ have the eigenvalue-eigenvector pair
(λ1, e1), (λ2, e2), . . . , (λp, ep) where λ1 ≥ λ2 ≥ · · · ≥ λp ≥ 0. Then the ith
principal component is given by
Yi = e0iX = ei1X1 + ei2X2 + · · · + eipXp, i = 1, 2, . . . , p
With these choices,
Var(Yi) = e0iΣei = λi, i = 1, 2, . . . , p
Cov(Yi, Yk ) = e0iΣek = 0, i 6= k

If some λi are equal, the choices of corresponding coefficients vectors, ei, and
hence Yi are not unique.

Results 5.2 Let X 0 = [X1, X2, . . . , Xp] have covariance matrix Σ, with
eigenvalue-eigenvector pairs (λ1, e1), (λ2, e2), . . . , (λp, ep) where λ1 ≥ λ2 ≥
· · · ≥ λp ≥ 0. Let Y1 = e01X, Y2 = e02X, . . . , Yp = e0pX be the principal
components. Then
p
X p
X
σ11 + σ22 + · · · + σpp = Var(Xi) = λ1 + λ2 + · · · + λp = Var(Yi)
i=1 i=1 5
Results 5.3 If Y1 = e01X, Y2 = e02X, . . . , Yp = e0pX are the principal
components obtained from the covariance matrix Σ, then

eik λi
ρYi,Xk = √ , i, k = 1, 2, . . . , p
σkk

are the correlation coefficients between the components Yi and the variables
Xk . Here (λ1, e1), (λ2, e2), . . . , (λp, ep) are the eigenvalue-eigenvector pair for
Σ.

Example 5.1 Suppose the random variables X1, X2 and X3 have the covariance
matrix  
1 −2 0
Σ =  −2 5 0  .
0 0 2
Calculating the population principal components

6
7
Suppose X is distributed as Np(µ, Σ). We know that the density of X is
constant on the µ centered ellipsoids

(x − µ)0Σ−1(x − µ) = c2

which have axes ± λiei, i = 1, 2, . . . , p, where the (λi, ei) are the eigenvalue-
eigenvector pairs of Σ. Assume µ = 0, the equation above can be rewritten
as

2 0 1 0 2
−1 1 0 2 1 0 2
c = x Σ x = (e x) + (e2x) + · · · + (epx)
λ1 λ2 λp
1 2 1 2 1 2
= y + y + · · · + yp
λ1 1 λ2 2 λp

where e01x, e02x, . . . , e0px are recognized as the principal components of x. The
equation above defines in a coordinate system with axes y1, y2, . . . , yp lying in
the direction e1, e2, . . . , ep, respectively.

8
Principal Components Obtained from Standardized
Variables
Principal components may also be obtained from the standardized variables
Xi − µi
Zi = √ , i = 1, 2, . . . , p
σii
Or in matrix notation Z = (V1/2)−1(X − µ). Clearly E(Z) = 0 and Cov(Z) =
(V1/2)−1Σ(V1/2)−1 = ρ

Results 5.4 The ith principal component of the standardized variables Z0 =


[Z1, Z2, . . . , Zp] with Cov(Z) = ρ, is given by
Yi = e0iZ = ei(V1/2)−1(X − µ), i = 1, 2, . . . , p
Moveover, p
X p
X
Var(Yi) = Var(Zi) = p
i=1 i=1
and p
ρYi,Zk = eik λi, i, k = 1, 2, . . . , p
In this case (λ1, e1), (λ2, e2), . . . , (λp, ep) are the eigenvalue-eigenvector pairs
for ρ, with λ1 ≥ λ2 ≥ · · · λp ≥ 0. 9
Example 5.2 Consider the covariance matrix
   
1 4 1 .4
Σ= and ρ =
4 100 .4 1

Obtain the principle components by the covariance matrix Σ and correlation


matrix ρ.

Principal Components for Covariance Matrices with Special Structures


1.  
σ11 0 · · · 0
 0 σ22 · · · 0 
Σ=
 .. .. ... .. 

0 0 · · · σpp

2.
2 2 2
   
σ ρσ · · · ρσ 1 ρ ··· ρ
 ρσ 2 σ 2 · · · ρσ 2   ρ 1 ··· ρ 
Σ=
 .. .. ... .. 
 or ρ = Σ = 
 .. .. . . . .. 

ρσ 2 ρσ 2 · · · σ 2 ρ ρ ··· 1 10
Summarizing Sample Variation by Principle Components
Suppose the data x1, x2, . . . , xn represent n independent drawings from some
p-dimensional population withe mean vector µ and covariance matrix Σ. These
data yield the sample mean vector x̄, the sample covariance matrix S, and the
sample correlation matrix R.
If S = sik be p × p sample covariance matrix with eigenvalue-eigenvector
pairs (λ̂1, ê1), (λ̂2, ê2), . . . , (λ̂p, êp), the ith sample principal component is given
by
ŷi = ê0ix = êi1x1 + êi2x2 + · · · + êipxp, i = 1, 2, . . . , p
where λ̂1 ≥ λ̂2 ≥ · · · ≥ λ̂p ≥ 0 and x is any observation on the variables
X 1, X 2, . . . , X p. Also
Sample variance(ŷk = λ̂k , k = 1, 2, . . . , p
Sample covariance(ŷi, ŷk ) = 0, i 6= k
Xn
Total sample variance = sii = λ̂1 + λ̂2 + · · · + λ̂p
i=1
p
êik λ̂i 11
rŷi,xk = √ , i, k = 1, 2, . . . , p.
skk
Example 5.3 (Summarizing sample variability with two sample principal
components) A census provided information, by tract, on five socioeconomic
variables for Madison, Wisconsin, area. The data from 61 tracts are list in Table
8.5. These data produced the following summary statistics

x̄0 = [4.47, 3.96, 71.42, 26.91, 1.64]


total professional employed government median
population degree age over 16 employment home value
(thousands) (percent) (percent) (percent) $100,000

and  
3.397 −1.102 4.306 −2.078 0.027
 −1.102 9.673 −1.513 10.953 1.203 
 
S =  4.306 −1.513 55.626 −28.937 −0.044 


 −2.078 10.953 −28.937 89.067 0.957 
0.027 1.203 −0.044 0.957 0.319
Can the sample variation be summarized by one or two principal components ?

12
13
The number of Principal Components
There is always the question of how many components to retain. There is no
definitive answer to this question. Things to consider include

• the amount of total sample variance explained,

• the relative sizes of the eigenvalues (the variances of the sample components,)

• the subject-matter interpretation of the components.

• In addition, a component associated with an eigenvalue near zero and, hence


deemed unimportant, may indicate an unsuspected linear dependency in the
data.

A useful visual aid to determining an appropriate number of principal components


is a scree plot. With the eigenvalues ordered from largest to smallest, a scree
plot is a plot of λ̂i versus i—the magnitude of an eigenvalue versus its number.
14
15
Example 5.4 (Summarizing sample variability with one sample principal
component) In a study of size and shape relationships for painted turtles,
Jolicoeur and Mosimann measured carapace length, width, and height. Their
data, reproduced in Table 6.9 suggest an analysis in term s of logarithms
(Jolicoeur generally suggests a logarithmic transformation in studies of size-and-
shape relationships). Perform a principal component analysis.

16
17
18
Interpretation the Sample Principal Components
The sample principal components have serval interpretations
• Suppose the underlying distribution of X is nearly Np(µ, Σ), Then the sample
principal components ŷi = ê0i(x − x̄) are realizations of population principal
components Yi = ei(X − µ), which have an Np(0, Λ) distribution. The
diagonal matrix Λ has entries λ1, λ2, . . . , λp and (λi, ei) are the eigenvalue-
eigenvector pairs of Σ.

• From the sample value xj , we can approximate µ by x̄ and Σ by S. If S is


positive definite, the contour consisting of all p × 1 vector x satisfying
(x − x̄)0S−1(x − x̄) = c2
estimates the constant density contour (x − µ)0Σ−1(x − µ) = c2 of the
underlying normal density.

• Even when the normal assumption is suspect and the scatter plot may depart
somewhat from an elliptical pattern, we can still extract eigenvalues from S
and obtain the sample principal components.
19
20
Standardizing the Sample Principal Components

• Sample principal components are, in general, not invariant with respect to


changes in scale.

• Variables measured on different scales or on a common scale with widely


differing ranges are often standardized. For example, standardization is
accomplished by constructing
 0
xj1 − x̄1 xj2 − x̄2 xj1 − x̄p
zj = D−1/2(xj − x̄) = √ , √ ,..., √ , j = 1, 2, . . . , n.
s11 s22 spp

21
If z1, z2, . . . , zn are standardized observations with covariance matrix R, the
ith sample principal component is

ŷi = ê0iz = êi1z1 + êi2z2 + · · · + êipzp, i = 1, 2, . . . , p

where (λ̂i, êi) is the ith eigenvalue-eigenvector pair of R with

Sample variance(ŷi) = λ̂i, i = 1, 2, . . . , p


Sample covariance(ŷi, ŷk ) = 0, i 6= k

In addition,

Total(standardized) sample variance = tr(R) = p = λ̂1 + λ̂2 + · · · + λ̂p

and q
rŷi,zk = êik λ̂i, i, k = 1, 2, . . . , p

22
Example 5.5 (Sample principal components from standardized data) The
weekly rates return for five stocks (JP Morgen, Citibank, Wells Fargo, Royal
Dutch Shell, and ExxonMobil) list on the New York Stock Exchange were
determined for the period January 2004 through December 2005. The weekly
rates of return are defined as (current week closing price-previous week closing
price)/(previous week closing price), adjusted for stock splits and dividends,
The data are listed in Table 8.4. The observations in 103 successive weeks
appear to be independently distributed, but the rates of return across stocks are
correlated, because as one expects, stocks tend to move together in response
to general economic conditions. Standardizing this data set and find sample
principal components data set after standardized.

23
Example 5.6 (Components from a correlation matrix with a special
structure) Geneticists are often concerned with the inheritance of characteristics
that can be measured several times during an animal’s lifetime. Body weight
(in grams) for n = 150 female mice were obtained immediately after the birth
of their first four litters. The sample mean vector and sample correlation matrix
were, respectively,
x̄0 = [39.88, 45.08, 48.11, 49.95]
and  
1.000 .7501 .6329 .6363
 .7501 1.000 .6925 7386 
R=  .6329

.6925 1.000 .6625 
.6363 .7386 .6625 1.000
Find sample principal components by R.

24
5.2 Factor Analysis and Inference for Structured Covariance
Matrices

• The essential purpose of factor analysis is to describe, if possible, the


covariance relationships among many variables in terms of a few underlying,
but unobservable, random quantities called factors

• Factor analysis can be considered an extension of principal component


analysis. Both can be viewed as attempts to approximate the covariance
matrix Σ. However, the approximation based on the factor analysis model is
more elaborate.

• The primary question in factor analysis is whether the data are consistent
with a prescribed structure.

25
5.2.1 The Orthogonal Factor Model
• The observable random vector X, with p components, has mean µ and
covariance matrix Σ.
• The factor model postulates that X is linearly dependent upon a few
unobservable random variables F1, F2, . . . , Fm, called common factors, and p
additional sources of variation ε1, ε2, . . . , εp, called errors, sometimes, specific
factors.
• In particular, the factor analysis model is
X1 − µ1 = `11F1 + `12F2 + · · · + `1mFm + ε1
X2 − µ2 = `21F1 + `22F2 + · · · + `2mFm + ε1
..

Xp − µp = `p1F1 + `p2F2 + · · · + `pmFm + εp

or in matrix notation
X − µ = LF + ε
The coefficient `ij is called the loading of the ith variable on the jth factor,
26
so the matrix L is the matrix of factor loadings.
• The unobservable random vectors F and ε satisfy the following conditions:
F and ε are independent
E(F) = 0, Cov(F) = I
E(ε) = 0, Cov(ε) = Ψ, where Ψ is diagonal matrix.
• Covariance structure for the Orthogonal Factor Model
1. Cov(X) = LL0 + Ψ or
Var(Xi) = `2i1 + · · · + `2im + ψi=h
ˆ 2i + ψi
Cov(Xi, Xk ) = `i1`k1 + · · · + `im`km

2. Cov(X, F) = L or
Cov(Xi, Fj ) = `ij
Example 5.7 Consider the covariance matrix
 
19 30 2 12
 30 57 5 23 
Σ=  2 5 38

47 
12 23 47 68

0 27
Verifying the relation Σ = LL + Ψ for two factors
Unfortunately, for the factor analyst, most covariance matrices cannot be
factored as LL0 + Ψ, where the number of factors m is much less than p.

Example 5.8 Let p = 3 and m = 1, and suppose the random variables X1, X2
and X3 have the positive definite covariance matrix
 
1 .9 .7
Σ =  .9 1 .4 
.7 .4 1

Show Σ can not be factored by a factor analysis model with m = 1.

Factor loadings L are determined only up to an orthogonal matrix T. Thus,


the loadings
L∗ = LT and L
both give the same representation. The communalities, given by the diagonal
elements of L0L = (L∗)(L∗)0 are also unaffected by the choice of T.

X − µ = LF + ε = LTT0F + ε = L∗F∗ + ε

Σ = LL0 + Ψ = LTT0L0 + Ψ = (L∗)(L∗)0 + Ψ 28


Methods of Estimation
The Principal Component Solution of the Factor Model
The principal component analysis of the sample covariance
matrix S is specified in terms of its eigenvalue-eigenvector pairs
(λ̂1, ê1), (λ̂2, ê2), . . . , (λ̂p, êp), where λ̂1 ≥ λ̂2 ≥ · · · ≥ λ̂p. Let m < p be
the number of common factors. Then the matrix of estimate factor loading
{`˜ij } is give by q q q 
L̃ =
1 1 λ̂ ê .. λ̂ ê .. · · · .. λ̂ ê
2 2 m m

The estimate specific variances are provided by the diagonal elements of the
0
matrix S − L̃L̃ , so
 
ψ̃1 0 · · · 0
m
 0 ψ̃2 · · · 0 
  X
Ψ= . with ψ̃ = s − ˜ij
`
 . .
. . .
.. .   i ii
j=1
0 0 · · · ψ̃p
Communalities are estimated as
h̃2i = `˜2i1 + `˜2i2 + · · · + `˜2im
The principal component factor analysis of the sample correlation matrix is
29
obtained by starting with R in place of S.
• For the principal component solution, the estimated loading for a given factor
do not changes as the number of factors is increased.

• The choice of m can be based on the estimated eigenvalues in much the


same manner as with principal components.

• Analytically, we have

0
Sum of squareed entries of (S − (L̃L̃ + Ψ̃)) ≤ λ̂2m+1 + · · · + λ̂2p

• Ideally, the contributions of the first few factors to the sample variance of
the variables should be large.
  
Proportion of total λ̂j
 sample variance  =

s11 +s22 +···+spp for a factor analysis of S
λj
due to jth factor

p for a factor analysis of R

30
Example 5.9 In a consumer-preference study, a random sample of customers
were asked to rate several attributions of a new product. The response, on a
7-point semantic differential scale, were tabulated and the attribute correlation
matrix constructed. The correlation matrix is presented next:

do factor analysis for this consumer-preference data

31
32
Example 5.10 Stock-price data consisting of n = 103 weekly rates of return on
p = 5 stocks were introduced in Example 5.5. Do factor analysis for this data.

33
The Maximum Likelihood Method
Results 5.5 Let X 1, X 2, . . . , X n be a random sample from Np(µ, Σ), where
Σ = LL0 + Ψ is the covariance matrix for the m common factor model. The
maximum likelihood estimator L̂ and Ψ̂ and µ̂ = x̄ maximize the likelihood
function of X j − µ = LFj + εj , j = 1, 2, . . . , n
" !#
− 12 tr Σ−1
Pn (x −x̄)(x −x̄)0+n(x̄−µ)(x̄−µ)0
j j
− np n
−2
L(µ, Σ) = (2π) 2 |Σ| e j=1

−1
subject to L̂Ψ̂ L̂ be diagonal.

The maximum likelihood estimates of the communalities are

ĥ2i = `ˆ2i1 + `ˆ2i2 + · · · + `ˆ2im for i = 1, 2, . . . , p

so
`ˆ21j + `ˆ22j + · · · + `ˆ2pj
 
Proportion of total sample
=
variance due to jth factor s11 + s22 + · · · + spp 34
Although the likelihood in Results 5.5 is appropriate for S, not R, surprisingly,
this practice is equivalent to obtaining the maximum likelihood estimate L̂ and
−1/2 −1/2
Ψ̂ based on the sample covariance matrix S, setting L̂z = V̂ Ψ̂V̂ .
−1/2
Here V̂ is the diagonal matrix√with reciprocal of the sample standard
deviation (computed with the divisor n) on the main diagonal, and Z is the
standardized observation with sample mean 0 and sample standard deviation 1.

Example 5.11 Using the maximum likelihood method do factor analysis for the
stock-price data.

35
Example 5.12 (Factor analysis of Olympic decathlon data) Linden originally
conducted a factor analytic study of Olympic decathlon results for all 160
complete starts from the end of World War II until the mid-seventies . Following
his approach we examine the n = 280 complete starts from 1960 through 2004.
The recorded values for each event were standardized and the signs of the timed
events changed so that large scores are good for all events. We, too, analyze
the correlation matrix, which based on all 280 cases, is

36
37
Principal component

38
Factor Rotation
If L̂ is the p × m matrix of estimated factor loadings obtained by any method
(principal component,maximum likelihood,and so forth) then


L̂ = LT, where TT0 = T0T = I

is a p × m matrix of “rotated” loadings. Moreover, the estimated covariance


(or correlation) matrix remains unchanged, since

0 0 0 ∗ ∗0
L̂L̂ + Ψ̂ = L̂TT L̂ + Ψ̂ = L̂ L̂ + Ψ̂

39
Example 5.13 (A first look at factor rotation) Lawley and Maxwell present
the sample correlation matrix of examination scores in p = 6 subject areas for
n = 220 male students. The correlation matrix is

and a maximum likelihood solution for m = 2 common factors yields the


estimates in Table 9.5

40
41
Varimax(or normal varimax) criterion

Define `˜∗ij = `ˆ∗ij /ĥi to be the rotated coefficients scaled by the square
root of the communalities . Then the (normal) varimax procedure selects the
orthogonal transformation T that makes
 !2
m p p
1 X X X
V =  `˜∗4
ij − `˜∗2
ij

p i=1 i=1 i=1

as large as possible.

Scaling the rotated coefficient `ˆ∗ij has the effect of giving variables with small
communalities relatively more weight in the determination of simple structure.
After the transformation T is determined, the loadings `˜∗ij are multiplied by ĥi
so that the original communalities are preserved.

42
Example 5.14 (Rotated Loading for the consumer-preference data)

43
44
Example 5.15 ( Rotated loading for the stock-price data)

45
Example 5.15 (Rotated loadings for the Olympic decathlon data)

46
47
Factor Scores

• The estimate values of the common factors, called factor scores may also
required. These quantities are often used for diagnostic purposes, as well as
inputs to a subsequent analysis.

• Factor scores are not estimates of unknown parameters in the usual sense.
Rather, they are estimates of values for the unobserved random factor vectors
Fj , j = 1, 2, . . . , n. That is, factor scores

f̂j = estimate of the values fj attained by Fj (jth case)

• Normally the factor score approaches have two elements in common:


1. They treat the estimate factor loadings `ˆij and specific variance ψ̂i as if
they were true values.
2. They involve linear transformations of the original data, perhaps centered
or standardized. Typically, the estimated rotated loadings, rather than
the original estimated loadings, are used to compute factor scores. 48
Factor Scores Obtained by Weighted Least Squares from the Maximum
Likelihood Estimates
0 −1 −1 0 −1
f̂j = (L̂ Ψ̂ L̂) L̂ Ψ̂ (xj − µ̂)
ˆ −1L̂0Ψ̂−1(xj − x̂), j = 1, 2, . . . , n
= ∆

or, if the correlation matrix is factored

0 −1 0 −1
f̂j = (L̂z Ψ̂z L̂z )−1L̂z Ψ̂z zj
ˆ −1L̂0 Ψ̂−1zj , j = 1, 2, . . . , n
= ∆ z z z

0
where zj = D−1/2(xj − x̄) and ρ̂ = L̂z L̂z + Ψ̂z .


• If rotated loadings L̂ = L̂T are used in place of the original loadings, the
∗ ∗
subsequent factor scores, f̂j , are related to f̂j by f̂j = T0f̂j , j = 1, 2, . . . , n.

49
• If the factor loadings are estimated by the principal component method, it
is customary to generate factor scores using an unweighted (ordinary) least
squares procedure. Implicitly, this amount to assuming that the ψi are equal
or nearly equal. The factor scores are then

0 −1 0 0 −1 0
f̂j = (L̂ L̂) L̂ (xj − µ̂) or f̂j = (L̂z L̂z ) L̂z zj

for standardized data.

Factor Scores Obtained by Regression

0
f̂j = L̂ S−1(xj − x̄), j = 1, 2, . . . , n

or, if a correlation matrix is factored

0 −1
f̂j = L̂z R zj , j = 1, 2, . . . , n

−1/2 0 50
where zj = D (xj − x̄) and ρ̂ = L̂z L̂z + Ψ̂z .
Example 5.16 (Computing factor scores) Compute factor scores by the least
squares and regression methods using the stock-price data discussed in 5.11.

51
Perspectives and a Stragegy for Factor Analysis
At the present time, factor analysis still maintains the flavor of an art, and no
single strategy should yet be “ chiseled into stone”. We suggest and illustrate
one reasonable option:

1. Perform a principal component factor analysis. This method is


particularly appropriate for a first pass through the data. (It is not required
that R or S be nonsingular)
(a) Look for suspicious observations by plotting the factor scores. Also,
calculate standardized scores for each observation and squared distances.
(b) Try a varimax rotation.

2. Perform a maximum likelihood factor analysis, including a


varimax rotation.

52
3. Compare the solution obtained from the two factor analysis.
(a) Do the loadings group in the same manner ?
(b) Plot factor scores obtained for principal components against scores from
the maximum likelihood analysis.

4. Repeat the first three steps for other numbers of common factors
m. Do extra factors necessarily contribute to the understanding and
interpretation of the data ?

5. For large data sets, split them in half and perform a factor
analysis on each part. Compare the two results with each other and
with that obtained from the complete data set to check the stability of the
solution.

53

You might also like