CAMI16 – Data
Analytics
Module 4 - contents
• Principal Component Analysis-Extracting Principal Components,
Graphing of Principal Components, Some sampling Distribution
results, Component scores, Large sample Inferences, Monitoring
Quality with principal Components.
• Factor Analysis-Orthogonal Factor Model, Communalities, Factor
Solutions and rotation.
Principal Component Analysis
• Principal component analysis (PCA) is a statistical procedure that is
used to reduce the dimensionality.
• It uses an orthogonal transformation to convert a set of observations
of possibly correlated variables into a set of values of linearly
uncorrelated variables called principal components. It is often used as
a dimensionality reduction technique.
Principal Component Analysis
Principal Component Analysis
Principal Component Analysis
Principal Component Analysis
Principal Component Analysis
PCA Algorithm
• Standardize the data: PCA requires standardized data, so the first step is to
standardize the data to ensure that all variables have a mean of 0 and a
standard deviation of 1.
• Calculate the covariance matrix: The next step is to calculate the covariance
matrix of the standardized data. This matrix shows how each variable is related
to every other variable in the dataset.
• Calculate the eigenvectors and eigenvalues: The eigenvectors and eigenvalues
of the covariance matrix are then calculated. The eigenvectors represent the
directions in which the data varies the most, while the eigenvalues represent
the amount of variation along each eigenvector.
• Choose the principal components: The principal components are the
eigenvectors with the highest eigenvalues. These components represent the
directions in which the data varies the most and are used to transform the
original data into a lower-dimensional space.
• Transform the data: The final step is to transform the original data into the
lower-dimensional space defined by the principal components.
PCA Example
PCA Example
• Standardize the data set using
Or using xnew =
PCA Example
• Calculate Co-variance Matrix
PCA Example
• Calculate Eigen value:
PCA Example
• Calculate Eigen vector
PCA Example
• Sort the Eigen values and their corresponding Eigen vectors
• λ = 2.51579324 , 1.0652885 , 0.39388704 , 0.02503121
• Pick k eigenvalues and form a matrix of eigenvectors
• If we choose the top 2 eigenvectors, the matrix will look like this:
PCA Example
• Transform the original data
PCA Reconstruction
• Transformed data * (top k eigen vector)T = zero mean data =
• Zero mean data+ mean = Original data
•
• + =
Applications of PCA
•Data Visualization/Presentation
•Data Compression
•Noise Reduction
•Data Classification
•Trend Analysis
•Factor Analysis
PCA in R
# display iris dataset
print(iris)
# use dim() to get dimension of dataset Dimension: 150 5
cat("Dimension:",dim(iris))
# use nrow() to get number of rows Row: 150
cat("\nRow:",nrow(iris))
# use ncol() to get number of columns Column: 5
cat("\nColumn:",ncol(iris))
# use names() to get name of variable of dataset Name of Variables: [Link] [Link]
cat("\nName of Variables:",names(iris)) [Link] [Link] Species
PCA in R
# get statistical summary of Sepal length variable [Min. 1st Qu. Median Mean 3rd Qu. Max.
summary(iris$[Link]) 4.300 5.100 5.800 5.843 6.400 7.900
Perform PCA on the iris data Importance of components:
PCA_iris=prcomp(iris[,c(1:4)],scale=TRUE) PC1 PC2 PC3 PC4
Standard deviation 1.7084 0.95600.38309 0.14393
# get statistical summary of PCS results Proportion of Variance 0.7296 0.2285 0.03669 0.00518
summary(PCA_iris) Cumulative Proportion 0.7296 0.9581 0.99482 1.00000
[Link] <- plot(PCA_iris, type="l")
[Link]
library(ggfortify)
[Link] <- autoplot(PCA_iris,
data = iris,
colour = 'Species’)
[Link] <- biplot(PCA_iris)
[Link]
Factor Analysis
• Factor Analysis is a method for modeling observed variables, and
their covariance structure, in terms of a smaller number of underlying
unobservable (latent) “factors.”
• Factor analysis is generally an exploratory/descriptive method that
requires many subjective judgments.
• In factor analysis, we model the observed variables as linear functions
of the “factors.”
• In principal components, we create new variables that are linear
combinations of the observed variables. In both PCA and FA, the
dimension of the data is reduced.
Orthogonal factor Model
Orthogonal factor Model contd …
Model Assumptions
Model Assumptions
Estimating the factors – Principal
Component Method
Estimating the factors – Principal
Component Method contd…