0% found this document useful (0 votes)

19 views7 pages

Principle Component Analysis

Uploaded by

dtaditya26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views7 pages

Principle Component Analysis

Uploaded by

dtaditya26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

PRINCIPLE COMPONENT ANALYSIS

The main idea of principle component analysis (PCA) is to reduce the

dimensionality of a data set consisting of many variables correlated with each
other, either heavily or lightly, while retaining the variation present in the
dataset, up to the maximum extent. The same is done by transforming the
variables to a new set of variables, which are known as the principal
components (or simply, the PCs) and are orthogonal, ordered such that the
retention of variation present in the original variables decreases as we move
down in the order. So, in this way, the 1st principal component retains
maximum variation that was present in the original components. The principal
components are the eigenvectors of a covariance matrix, and hence they are
orthogonal.

Importantly, the dataset on which PCA technique is to be used must be

scaled. The results are also sensitive to the relative scaling. As a layman, it is
a method of summarizing data. Imagine some wine bottles on a dining table.
Each wine is described by its attributes like colour, strength, age, etc. But
redundancy will arise because many of them will measure related properties.
So what PCA will do in this case is summarize each wine in the stock with
less characteristics.

Intuitively, Principal Component Analysis can supply the user with a lower-
dimensional picture, a projection or "shadow" of this object when viewed from
its most informative viewpoint.
Image Source: Machine Learning Lectures by Prof. Andrew
NG at Stanford University
 Dimensionality: It is the number of random variables in a dataset or
simply the number of features, or rather more simply, the number of
columns present in your dataset.
 Correlation: It shows how strongly two variable are related to each
other. The value of the same ranges for -1 to +1. Positive indicates
that when one variable increases, the other increases as well, while
negative indicates the other decreases on increasing the former.
And the modulus value of indicates the strength of relation.
 Orthogonal: Uncorrelated to each other, i.e., correlation between
any pair of variables is 0.
 Eigenvectors: Eigenvectors and Eigenvalues are in itself a big
domain, let’s restrict ourselves to the knowledge of the same which
we would require here. So, consider a non-zero vector v. It is an
eigenvector of a square matrix A, if Av is a scalar multiple of v. Or
simply:
Av = ƛv
Here, v is the eigenvector and ƛ is the eigenvalue associated with it.
 Covariance Matrix: This matrix consists of the covariances between
the pairs of variables. The (i,j)th element is the covariance
between i-th and j-th variable.

Properties of Principal Component

Technically, a principal component can be defined as a linear combination of

optimally-weighted observed variables. The output of PCA are these principal
components, the number of which is less than or equal to the number of
original variables. Less, in case when we wish to discard or reduce the
dimensions in our dataset. The PCs possess some useful properties which
are listed below:

1. The PCs are essentially the linear combinations of the original

variables, the weights vector in this combination is actually the
eigenvector found which in turn satisfies the principle of least
squares.

2. The PCs are orthogonal, as already discussed.

3. The variation present in the PCs decrease as we move from the 1st
PC to the last one, hence the importance.

The least important PCs are also sometimes useful in regression, outlier
detection, etc.

Implementing PCA on a 2-D Dataset

Step 1: Normalize the data
First step is to normalize the data that we have so that PCA works properly.
This is done by subtracting the respective means from the numbers in the
respective column. So if we have two dimensions X and Y, all X become
𝔁- and all Y become 𝒚-. This produces a dataset whose mean is zero.

Step 2: Calculate the covariance matrix

Since the dataset we took is 2-dimensional, this will result in a 2x2 Covariance
matrix.

Please note that Var[X1] = Cov[X1,X1] and Var[X2] = Cov[X2,X2].

Step 3: Calculate the eigenvalues and eigenvectors
Next step is to calculate the eigenvalues and eigenvectors for the covariance
matrix. The same is possible because it is a square matrix. ƛ is an eigenvalue
for a matrix A if it is a solution of the characteristic equation:
det( ƛI - A ) = 0
Where, I is the identity matrix of the same dimension as A which is a required
condition for the matrix subtraction as well in this case and ‘det’ is the
determinant of the matrix. For each eigenvalue ƛ, a corresponding eigen-
vector v, can be found by solving:
( ƛI - A )v = 0
Step 4: Choosing components and forming a feature vector:
We order the eigenvalues from largest to smallest so that it gives us the
components in order or significance. Here comes the dimensionality reduction
part. If we have a dataset with n variables, then we have the
corresponding n eigenvalues and eigenvectors. It turns out that the
eigenvector corresponding to the highest eigenvalue is the principal
component of the dataset and it is our call as to how many eigenvalues we
choose to proceed our analysis with. To reduce the dimensions, we choose
the first p eigenvalues and ignore the rest. We do lose out some information in
the process, but if the eigenvalues are small, we do not lose much.
Learn Data Science by working on interesting Data Science
Projects for just $9
Next we form a feature vector which is a matrix of vectors, in our case, the
eigenvectors. In fact, only those eigenvectors which we want to proceed with.
Since we just have 2 dimensions in the running example, we can either
choose the one corresponding to the greater eigenvalue or simply take both.

Feature Vector = (eig1, eig2)

Step 5: Forming Principal Components:

This is the final step where we actually form the principal components using
all the math we did till here. For the same, we take the transpose of the
feature vector and left-multiply it with the transpose of scaled version of
original dataset.

NewData = FeatureVectorT x ScaledDataT

Here,

NewData is the Matrix consisting of the principal components,

FeatureVector is the matrix we formed using the eigenvectors we chose to
keep, and
ScaledData is the scaled version of original dataset
(‘T’ in the superscript denotes transpose of a matrix which is formed by
interchanging the rows to columns and vice versa. In particular, a 2x3 matrix
has a transpose of size 3x2)
If we go back to the theory of eigenvalues and eigenvectors, we see that,
essentially, eigenvectors provide us with information about the patterns in the
data. In particular, in the running example of 2-D set, if we plot the
eigenvectors on the scatterplot of data, we find that the principal eigenvector
(corresponding to the largest eigenvalue) actually fits well with the data. The
other one, being perpendicular to it, does not carry much information and
hence, we are at not much loss when deprecating it, hence reducing the
dimension.

All the eigenvectors of a matrix are perpendicular to each other. So, in PCA,
what we do is represent or transform the original dataset using these
orthogonal (perpendicular) eigenvectors instead of representing on
normal x and y axes. We have now classified our data points as a
combination of contributions from both x and y. The difference lies when we
actually disregard one or many eigenvectors, hence, reducing the dimension
of the dataset. Otherwise, in case, we take all the eigenvectors in account, we
are just transforming the co-ordinates and hence, not serving the purpose.

Applications of Principal Component Analysis

PCA is predominantly used as a dimensionality reduction technique in

domains like facial recognition, computer vision and image compression. It is
also used for finding patterns in data of high dimension in the field of finance,
data mining, bioinformatics, psychology, etc.

PCA for images:

You must be wondering many a times show can a machine read images or do
some calculations using just images and no numbers. We will try to answer a
part of that now. For simplicity, we will be restricting our discussion to square
images only. Any square image of size NxN pixels can be represented as
a NxN matrix where each element is the intensity value of the image. (The
image is formed placing the rows of pixels one after the other to form one
single image.) So if you have a set of images, we can form a matrix out of
these matrices, considering a row of pixels as a vector, we are ready to start
principal component analysis on it. How is it useful ?
Say you are given an image to recognize which is not a part of the previous
set. The machine checks the differences between the to-be-recognized image
and each of the principal components. It turns out that the process performs
well if PCA is applied and the differences are taken from the ‘transformed’
matrix. Also, applying PCA gives us the liberty to leave out some of the
components without losing out much information and thus reducing the
complexity of the problem.

For image compression, on taking out less significant eigenvectors, we can

actually decrease the size of the image for storage. But to mention, on
reproducing the original image from this will lose out some information for
obvious reasons.

Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
22 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Unit 3
No ratings yet
Unit 3
28 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
What Is Principal Component Analysis (PCA) ?
No ratings yet
What Is Principal Component Analysis (PCA) ?
13 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Pages 141-210
No ratings yet
Pages 141-210
70 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
Program 3
No ratings yet
Program 3
7 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
Dimensionality Reduction (Pca)
No ratings yet
Dimensionality Reduction (Pca)
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
PCA
100% (1)
PCA
33 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Pca 1
No ratings yet
Pca 1
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
15 pages
Understanding Principal Component Analysis
100% (1)
Understanding Principal Component Analysis
18 pages
CH 6
No ratings yet
CH 6
11 pages
Module 2-PCA-1
No ratings yet
Module 2-PCA-1
26 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Principal Component Analysis (PCA) : Gundimeda Venugopal
No ratings yet
Principal Component Analysis (PCA) : Gundimeda Venugopal
17 pages
PCA Guide and R Implementation
No ratings yet
PCA Guide and R Implementation
11 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
5 Dimentionality Reduction
No ratings yet
5 Dimentionality Reduction
27 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Princomps George Dallas
No ratings yet
Princomps George Dallas
9 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
17 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Dimensionality Reduction (Principal Component Analysis)
No ratings yet
Dimensionality Reduction (Principal Component Analysis)
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Chapter 2 - Machine Learning 2. Principal Component Analysis
No ratings yet
Chapter 2 - Machine Learning 2. Principal Component Analysis
8 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
No ratings yet
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
9 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
5 pages
Robust Regression
No ratings yet
Robust Regression
25 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Ride Test On Vehicles Travelling Over Speed Bumps
No ratings yet
Ride Test On Vehicles Travelling Over Speed Bumps
12 pages
Assignment 1 - Problem - Statements
No ratings yet
Assignment 1 - Problem - Statements
6 pages
SY-527 (Mathe. (Scie) 60
No ratings yet
SY-527 (Mathe. (Scie) 60
12 pages
Matrices and Determinant (Students Study Material & Assignment)
No ratings yet
Matrices and Determinant (Students Study Material & Assignment)
31 pages
(Ebook PDF) Elasticity: Theory, Applications, and Numerics 4th Edition PDF Download
100% (1)
(Ebook PDF) Elasticity: Theory, Applications, and Numerics 4th Edition PDF Download
37 pages
Assignment 1 Mathematics I
No ratings yet
Assignment 1 Mathematics I
3 pages
Matrix Rank Calculation Guide
No ratings yet
Matrix Rank Calculation Guide
4 pages
Exam Paper MABY4100 2019.11 - SOLUTIONS
No ratings yet
Exam Paper MABY4100 2019.11 - SOLUTIONS
10 pages
Transpose PDF
No ratings yet
Transpose PDF
3 pages
Spectral Decomposability of Normal
No ratings yet
Spectral Decomposability of Normal
22 pages
Manual LIS
No ratings yet
Manual LIS
99 pages
Bmas 0104
No ratings yet
Bmas 0104
1 page
FTD AM-II Sem-II ALL Engg.
No ratings yet
FTD AM-II Sem-II ALL Engg.
4 pages
MathWorks. MatLAB Mathematics R2025b
No ratings yet
MathWorks. MatLAB Mathematics R2025b
876 pages
Matrix Inversion Using Cholesky Decomposition
No ratings yet
Matrix Inversion Using Cholesky Decomposition
3 pages
Mathematical Tools ALP
No ratings yet
Mathematical Tools ALP
5 pages
[Algebra Practice Workbook With Answers Improve Your Math Fluency Series] Chris McMullen - Systems of Equations Substitution Simultaneous Cramer s Rule Algebra Practice Workbook With Answers Improve Your Math Fluency Series 20 Chr
100% (13)
[Algebra Practice Workbook With Answers Improve Your Math Fluency Series] Chris McMullen - Systems of Equations Substitution Simultaneous Cramer s Rule Algebra Practice Workbook With Answers Improve Your Math Fluency Series 20 Chr
374 pages
Algebra II Exercises: Endomorphism & Systems
No ratings yet
Algebra II Exercises: Endomorphism & Systems
2 pages
GATE Handwritten Notes
No ratings yet
GATE Handwritten Notes
1 page
291 Class Qs - Number of Matrices @IndAlok
No ratings yet
291 Class Qs - Number of Matrices @IndAlok
2 pages
Unit - 2 Array: Linear Data Structure
No ratings yet
Unit - 2 Array: Linear Data Structure
14 pages
7.3 Notes and Examples
No ratings yet
7.3 Notes and Examples
2 pages
Matlab Presentation
No ratings yet
Matlab Presentation
27 pages
Expected Value of Quadratic Forms
No ratings yet
Expected Value of Quadratic Forms
5 pages
Linear Algebra: Inverse Matrices
No ratings yet
Linear Algebra: Inverse Matrices
7 pages
Class XII Mathematics Assessment 2024-25
No ratings yet
Class XII Mathematics Assessment 2024-25
2 pages
Programmers Guide
No ratings yet
Programmers Guide
100 pages
Solution Manual For Modern Quantum Mechanics 2nd Edition by Sakurai
36% (14)
Solution Manual For Modern Quantum Mechanics 2nd Edition by Sakurai
13 pages
Machine Learing Linear System Class Notes
No ratings yet
Machine Learing Linear System Class Notes
43 pages
CPSC 542F WINTER 2017: Lecture Notes
No ratings yet
CPSC 542F WINTER 2017: Lecture Notes
10 pages
Class 12 - Maths - Matrices
No ratings yet
Class 12 - Maths - Matrices
87 pages
Spherical Polar Coordinate System Explained
No ratings yet
Spherical Polar Coordinate System Explained
3 pages

Principle Component Analysis

Uploaded by

Principle Component Analysis

Uploaded by

PRINCIPLE COMPONENT ANALYSIS

The main idea of principle component analysis (PCA) is to reduce the

Importantly, the dataset on which PCA technique is to be used must be

Properties of Principal Component

Technically, a principal component can be defined as a linear combination of

1. The PCs are essentially the linear combinations of the original

2. The PCs are orthogonal, as already discussed.

Implementing PCA on a 2-D Dataset

Step 2: Calculate the covariance matrix

Please note that Var[X1] = Cov[X1,X1] and Var[X2] = Cov[X2,X2].

Feature Vector = (eig1, eig2)

Step 5: Forming Principal Components:

NewData = FeatureVectorT x ScaledDataT

NewData is the Matrix consisting of the principal components,

Applications of Principal Component Analysis

PCA is predominantly used as a dimensionality reduction technique in

PCA for images:

For image compression, on taking out less significant eigenvectors, we can

You might also like