0% found this document useful (0 votes)
25 views22 pages

Principal Component Analysis (PCA) - by Kavishka Abeywardana - Jun, 2024 - Medium

Uploaded by

minuch00news
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
25 views22 pages

Principal Component Analysis (PCA) - by Kavishka Abeywardana - Jun, 2024 - Medium

Uploaded by

minuch00news
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 22

10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

Open in app Sign up Sign in


To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
Search

Principal Component Analysis (PCA)


Kavishka Abeywardana · Follow
8 min read · Jun 17, 2024

Listen Share

Source: Towards Data Science

Suppose you are a scientist. You want to measure and understand the behavior of a
system. You set up an experiment and measure various quantities. In modern
experimental settings, you have the resources to collect large amounts of data and
features.

However, the data points can appear clouded, unclear, and redundant. You won’t be
able to understand the real structure of data. Let’s understand this using a simple

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 1/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

toy example.
To make Medium work, we log user data. By using Medium, you agree to
A toy example
our Privacy Policy, including cookie policy.

We are studying an ideal spring (massless and frictionless). Because it is ideal, the
mass must oscillate indefinitely along the x-axis. The system is straightforward and
can be explained as a function of x.

However, as an experimenter, we don’t know how many axes and dimensions are
important to explain the system. Thus, we measure the ball’s position in 3-
dimensional space using three cameras. At 200Hz, each camera captures an image
indicating the 2-dimensional position of the ball. Since we don’t have prior
knowledge about the system, we don’t know the optimal directions for the three
cameras.

Moreover, air, imperfect cameras, and less ideal springs can add noise to the
system. Thus, making assumptions directly from the data becomes even harder.

If we knew the system dynamics, we would directly measure the displacement along
the x-axis, using a single camera. However, now we have to extract the x-axis from

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 2/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

the complex and arbitrarily collected data set.


To make Medium work, we log user data. By using Medium, you agree to
Why do we need PCA?
our Privacy Policy, including cookie policy.

During the data collection process, we might collect unnecessary features. This
increases the complexity of the data set and dilutes the insights. Our goal is to find
the most important features or feature combinations.

We can directly pick a few features using our intuition about the data set. However,
in PCA we choose the most important components (not features). We transform the
existing (suppose we have 1000 features) feature set and generate new features (10
features).

These new features give us insights into the data set. Due to the richness of these
components, we can use them in other machine-learning tasks (supervised
learning). It reduces the complexity and avoids overfitting. Thus, PCA is a good
preprocessing step.

Capturing the variance


Suppose we have a two-dimensional data set.

We have to map this data set into a one-dimensional space. Our objective is to
capture the maximum amount of variance. What would be the solution?
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 3/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

The red line is the best one-dimensional axis. We orthogonally map each point onto
the red line. Observe that this new line is a mixture of the two features.

Now we have to formulate this process. We can use two approaches.

1. Maximize the variance.

2. Minimize the reconstruction error.

Reconstruction error is the squared difference between the mapping and the
original point. This is given by the perpendicular distance from the point to the line.
We want to minimize this difference.

Variance is the squared difference between the mean of the mapped data set and the
new mapped point. We want to maximize this variance.

We can easily prove that both of these approaches give the same result. They are like
two sides of the same coin.

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 4/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

||d|| is the standard deviation of the single point. (assume that the points are mean-
centered.) ||X|| is the magnitude of the vector related to the point. ||e|| is the
reconstruction error.

From the Pythagoras theorem, we can derive the following result.

We can sum this all over the data set and take the mean.

The square of the standard deviation is the variance. Thus, we can derive the
following relation.

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 5/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

Thus, when we increase the variance, the reconstruction error drops.

Loss function
We can use both approaches to derive the loss function for the optimization
problem. Assume that, we use a projection matrix w to project the data points from
the original data space into the selected subspace. X is the data matrix.

From the reconstruction error viewpoint, we can write the following loss function.
We use the L2 norm.

From the variance-maximizing viewpoint, we can write a loss function in the


following way. Since we are maximizing the variance, we add a negative sign to the
loss value. This forces us to minimize the loss function.

Assuming the data points are mean-centered, we can use the squared distance from
the origin. We can calculate the squared distance using the dot product.

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 6/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

However, the model might try to maximize the variance by increasing the size of the
w(projection) matrix. This is useless. Thus, we must introduce a constraint to limit
the size of the projection matrix.

We will use the variance approach.

Optimization
We use Lagrange multipliers to optimize the problem. It’s simple. We plug our
constraint into the optimization problem. This gives us an unconstrained problem.

Now we can differentiate both sides and find the optimal points. Everything is
quadratic. Thus, taking derivatives is simple.

C is the covariance matrix. This looks very familiar. w is an eigenvector of the


covariance matrix!

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 7/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

Thus, to minimize the variance, one should choose the eigenvector w with the
maximum eigenvalue λ. work, we log user data. By using Medium, you agree to
To make Medium
our Privacy Policy, including cookie policy.

Subsequent eigenvectors represent subsequent principal components. We choose


the eigenvectors in a greedy fashion.

We can easily calculate the eigenvalues and eigenvectors using numerical analysis
software.

Spectral Theorem
C is a symmetric p x p matrix. We can prove that such a matrix has p independent
(orthogonal to each other) eigenvectors.

This means that the eigenbasis is simply rotating our original coordinate system
such that every basis vector is an eigenvector.

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 8/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

We will stack the eigenvectors and create a new matrix V. XV rotates the original
axes. Let’s calculate
To make the covariance
Medium work, we logbased onBythe
user data. new
using coordinate
Medium, system.
you agree to
our Privacy Policy, including cookie policy.

Since V contains the eigenvectors, the off-diagonal terms become zero. Thus, we can
rewrite the covariance matrix as follows.

This is the standard eigen decomposition.

Relationship to singular value decomposition (SVD)


Consider the SVD of X.

U is the left singular matrix. V is the right singular matrix. S is a diagonal matrix.
Both U and V are orthogonal matrices. Thus, physically SVD performs rotation,
stretching, and rotation. The diagonal elements of S are singular values.

Now let’s calculate the covariance matrix.

We end up in the eigendecomposition of C. Thus, we can say that the eigenvalues of


C are the squared singular values divided by n. The eigenvectors are the columns of
the right singular matrix of X.

Total Variance
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 9/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

Trace is the sum of the diagonal values of a square matrix. Diagonal elements of the
covariance matrix give
To make us the
Medium work,variance
we log userof each
data. dataMedium,
By using feature. Thus,
you agree to the trace of C
our Privacy Policy, including cookie policy.
gives us the total variance.

Let’s calculate the trace of the covariance matrix.

Thus, we can derive the following result.

Using this result, we can calculate the percentage of variance each eigenvalue
captures.

Principal component regression (PCR)


PCA followed by regression is principal component regression (PCR). It is similar to
ridge regression.

We can write the results of a linear regression model in the following way.

U is the left singular matrix of X.

In ridge regression, we add a ridge penalty.

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 10/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

s denotes singular values. Smaller singular values compared to λ will get to zero.
Larger singular values
To make willwork,
Medium remain
we logunchanged. TheMedium,
user data. By using diagonal matrix
you agree to has decreasing
our Privacy Policy, including cookie policy.
diagonal values.

PCA does hard thresholding over ridge regression. We specifically choose the
largest singular values and ignore the rest.

Probabilistic PCA (PPCA)


We consider a latent variable model. Latent variables are the internal
representations of a system. We do not observe them.

Suppose the latent variables are distributed in a spherical Gaussian distribution with
unit variance.

Now we can derive a conditional probability distribution for the observed data.
Conditioning shifts the latent variable distribution.

Now we can write the mean and the covariance of the marginal distribution.

We want to find the best parameters to explain X, under the maximum likelihood
estimation(MLE) framework. We can use the expectation maximization (EM)
algorithm. The maximum likelihood solution is a PCA solution.

MLE of the W matrix contains the leading eigenvectors. If we assume a two-


dimensional latent variable model, we will get two leading eigenvectors. Most

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 11/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

variation of the data can be captured by these latent variables. The variances define
the reconstruction
To makeerror.
Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
References
This article is based on the lecture by Dr. Dmitry Kobak, Winter Term 2020/21 at the
University of Tübingen. He gives a beautiful explanation of the fundamentals of
PCA.

If you are a little rusty about singular value decomposition, watch the short lecture
by Prof. Gilbert Strang at MIT OpenCourseWare

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 12/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

For a beautiful geometric explanation of Lagrange multipliers, watch the Khan


Academy by Grant Sanderson

The toy problem is derived from A TUTORIAL ON PRINCIPAL COMPONENT


ANALYSIS Derivation, Discussion, and Singular Value Decomposition by Jon Shlens
at Princeton University

Machine Learning Statistics Probability Unsupervised Learning Pca Analysis

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 13/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

Follow

Written by Kavishka Abeywardana


191 Followers

Electronic and telecommunication engineering, University of Moratuwa (UG)

More from Kavishka Abeywardana

Kavishka Abeywardana

Probability Theory for Machine Learning

Jun 20 115 2

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 14/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

Kavishka Abeywardana

Vector Quantized Variational Auto-Encoder (VQ-VAE): Neural Discrete


Representation Learning: A…

Jun 13 14

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 15/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

Kavishka Abeywardana

Point Transformer: A Brief Summary

Jul 15 6

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 16/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

Kavishka Abeywardana

OpenCV#2: Object detection and Localization


object detection, contour maps, thresholding and canny edge detection

Feb 9 163

See all from Kavishka Abeywardana

Recommended from Medium

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 17/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

Nakul Upadhya in Towards Data Science

Introduction to Interpretable Clustering


What is interpretable clustering and why is it important.

Aug 1 279 3

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 18/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

Kavishka Abeywardana

Probability Theory for Machine Learning

Jun 20 115 2

Lists

Predictive Modeling w/ Python


20 stories · 1430 saves

Practical Guides to Machine Learning


10 stories · 1732 saves

Natural Language Processing


1628 stories · 1194 saves

The New Chatbots: ChatGPT, Bard, and Beyond


12 stories · 437 saves

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 19/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

Joseph Robinson, Ph.D. in Towards AI

The Fundamental Mathematics of Machine Learning


A Deep Dive into Vector Norms, Linear Algebra, Calculus

Jul 26 456 5

Yuki Shizuya in Intuition

Statistics: Multivariate time series analysis — VMA, VAR, VARMA


The mathematical understanding of VMA, VAR, and VARMA and practical Python
implementation

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 20/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

Aug 1 14
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.

David Such in AI Advances

0.2b Embedded AI — Exploratory Data Analysis (EDA)


Part B— From Identifying and Removing Anomalies to Feature Engineering.

Aug 1 111

Bence Balogh, PhD Candidate in Structural and Civil Engineering

Here is how to make digital elevation maps in Python in a matter of


minutes using TouchTerrain.
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 21/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium

Once it occurred to me that I had to make a 3d model of a certain area at the shores of Lake
Balaton, Hungary. The area
To make is around
Medium work, we the…
log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
Mar 2 186

See more recommendations

https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 22/22

You might also like