Principal Component Analysis (PCA) - by Kavishka Abeywardana - Jun, 2024 - Medium
Principal Component Analysis (PCA) - by Kavishka Abeywardana - Jun, 2024 - Medium
Listen Share
Suppose you are a scientist. You want to measure and understand the behavior of a
system. You set up an experiment and measure various quantities. In modern
experimental settings, you have the resources to collect large amounts of data and
features.
However, the data points can appear clouded, unclear, and redundant. You won’t be
able to understand the real structure of data. Let’s understand this using a simple
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 1/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
toy example.
To make Medium work, we log user data. By using Medium, you agree to
A toy example
our Privacy Policy, including cookie policy.
We are studying an ideal spring (massless and frictionless). Because it is ideal, the
mass must oscillate indefinitely along the x-axis. The system is straightforward and
can be explained as a function of x.
However, as an experimenter, we don’t know how many axes and dimensions are
important to explain the system. Thus, we measure the ball’s position in 3-
dimensional space using three cameras. At 200Hz, each camera captures an image
indicating the 2-dimensional position of the ball. Since we don’t have prior
knowledge about the system, we don’t know the optimal directions for the three
cameras.
Moreover, air, imperfect cameras, and less ideal springs can add noise to the
system. Thus, making assumptions directly from the data becomes even harder.
If we knew the system dynamics, we would directly measure the displacement along
the x-axis, using a single camera. However, now we have to extract the x-axis from
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 2/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
During the data collection process, we might collect unnecessary features. This
increases the complexity of the data set and dilutes the insights. Our goal is to find
the most important features or feature combinations.
We can directly pick a few features using our intuition about the data set. However,
in PCA we choose the most important components (not features). We transform the
existing (suppose we have 1000 features) feature set and generate new features (10
features).
These new features give us insights into the data set. Due to the richness of these
components, we can use them in other machine-learning tasks (supervised
learning). It reduces the complexity and avoids overfitting. Thus, PCA is a good
preprocessing step.
We have to map this data set into a one-dimensional space. Our objective is to
capture the maximum amount of variance. What would be the solution?
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 3/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
The red line is the best one-dimensional axis. We orthogonally map each point onto
the red line. Observe that this new line is a mixture of the two features.
Reconstruction error is the squared difference between the mapping and the
original point. This is given by the perpendicular distance from the point to the line.
We want to minimize this difference.
Variance is the squared difference between the mean of the mapped data set and the
new mapped point. We want to maximize this variance.
We can easily prove that both of these approaches give the same result. They are like
two sides of the same coin.
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 4/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
||d|| is the standard deviation of the single point. (assume that the points are mean-
centered.) ||X|| is the magnitude of the vector related to the point. ||e|| is the
reconstruction error.
We can sum this all over the data set and take the mean.
The square of the standard deviation is the variance. Thus, we can derive the
following relation.
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 5/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
Loss function
We can use both approaches to derive the loss function for the optimization
problem. Assume that, we use a projection matrix w to project the data points from
the original data space into the selected subspace. X is the data matrix.
From the reconstruction error viewpoint, we can write the following loss function.
We use the L2 norm.
Assuming the data points are mean-centered, we can use the squared distance from
the origin. We can calculate the squared distance using the dot product.
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 6/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
However, the model might try to maximize the variance by increasing the size of the
w(projection) matrix. This is useless. Thus, we must introduce a constraint to limit
the size of the projection matrix.
Optimization
We use Lagrange multipliers to optimize the problem. It’s simple. We plug our
constraint into the optimization problem. This gives us an unconstrained problem.
Now we can differentiate both sides and find the optimal points. Everything is
quadratic. Thus, taking derivatives is simple.
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 7/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
Thus, to minimize the variance, one should choose the eigenvector w with the
maximum eigenvalue λ. work, we log user data. By using Medium, you agree to
To make Medium
our Privacy Policy, including cookie policy.
We can easily calculate the eigenvalues and eigenvectors using numerical analysis
software.
Spectral Theorem
C is a symmetric p x p matrix. We can prove that such a matrix has p independent
(orthogonal to each other) eigenvectors.
This means that the eigenbasis is simply rotating our original coordinate system
such that every basis vector is an eigenvector.
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 8/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
We will stack the eigenvectors and create a new matrix V. XV rotates the original
axes. Let’s calculate
To make the covariance
Medium work, we logbased onBythe
user data. new
using coordinate
Medium, system.
you agree to
our Privacy Policy, including cookie policy.
Since V contains the eigenvectors, the off-diagonal terms become zero. Thus, we can
rewrite the covariance matrix as follows.
U is the left singular matrix. V is the right singular matrix. S is a diagonal matrix.
Both U and V are orthogonal matrices. Thus, physically SVD performs rotation,
stretching, and rotation. The diagonal elements of S are singular values.
Total Variance
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 9/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
Trace is the sum of the diagonal values of a square matrix. Diagonal elements of the
covariance matrix give
To make us the
Medium work,variance
we log userof each
data. dataMedium,
By using feature. Thus,
you agree to the trace of C
our Privacy Policy, including cookie policy.
gives us the total variance.
Using this result, we can calculate the percentage of variance each eigenvalue
captures.
We can write the results of a linear regression model in the following way.
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 10/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
s denotes singular values. Smaller singular values compared to λ will get to zero.
Larger singular values
To make willwork,
Medium remain
we logunchanged. TheMedium,
user data. By using diagonal matrix
you agree to has decreasing
our Privacy Policy, including cookie policy.
diagonal values.
PCA does hard thresholding over ridge regression. We specifically choose the
largest singular values and ignore the rest.
Suppose the latent variables are distributed in a spherical Gaussian distribution with
unit variance.
Now we can derive a conditional probability distribution for the observed data.
Conditioning shifts the latent variable distribution.
Now we can write the mean and the covariance of the marginal distribution.
We want to find the best parameters to explain X, under the maximum likelihood
estimation(MLE) framework. We can use the expectation maximization (EM)
algorithm. The maximum likelihood solution is a PCA solution.
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 11/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
variation of the data can be captured by these latent variables. The variances define
the reconstruction
To makeerror.
Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
References
This article is based on the lecture by Dr. Dmitry Kobak, Winter Term 2020/21 at the
University of Tübingen. He gives a beautiful explanation of the fundamentals of
PCA.
If you are a little rusty about singular value decomposition, watch the short lecture
by Prof. Gilbert Strang at MIT OpenCourseWare
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 12/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 13/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
Follow
Kavishka Abeywardana
Jun 20 115 2
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 14/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
Kavishka Abeywardana
Jun 13 14
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 15/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
Kavishka Abeywardana
Jul 15 6
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 16/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
Kavishka Abeywardana
Feb 9 163
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 17/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
Aug 1 279 3
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 18/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
Kavishka Abeywardana
Jun 20 115 2
Lists
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 19/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
Jul 26 456 5
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 20/22
10/8/24, 3:34 Principal Component Analysis (PCA) | by Kavishka Abeywardana | Jun, 2024 | Medium
Aug 1 14
To make Medium work, we log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
Aug 1 111
Once it occurred to me that I had to make a 3d model of a certain area at the shores of Lake
Balaton, Hungary. The area
To make is around
Medium work, we the…
log user data. By using Medium, you agree to
our Privacy Policy, including cookie policy.
Mar 2 186
https://medium.com/@kdwa2404/principal-component-analysis-pca-e9e87791ef8c 22/22