0% found this document useful (0 votes)
18 views29 pages

Principal Component Analysis: Source: Computing Science 466 / 551 R. Greiner, B. Póczos, University of Alberta

The document provides an overview of Principal Component Analysis (PCA), a technique used to reduce the dimensionality of data while preserving as much variance as possible. It discusses the motivation for PCA, its algorithms, applications, and theoretical foundations, emphasizing its utility in data visualization, noise reduction, and data classification. However, it also notes PCA's limitations, such as its inability to capture non-linear structures and lack of awareness of class labels.

Uploaded by

AhmedM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views29 pages

Principal Component Analysis: Source: Computing Science 466 / 551 R. Greiner, B. Póczos, University of Alberta

The document provides an overview of Principal Component Analysis (PCA), a technique used to reduce the dimensionality of data while preserving as much variance as possible. It discusses the motivation for PCA, its algorithms, applications, and theoretical foundations, emphasizing its utility in data visualization, noise reduction, and data classification. However, it also notes PCA's limitations, such as its inability to capture non-linear structures and lack of awareness of class labels.

Uploaded by

AhmedM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Principal Component

Analysis
Source: Introduction to Machine Learning
Computing Science 466 / 551
R. Greiner, B. Póczos, University of Alberta
https://webdocs.cs.ualberta.ca/~greiner/C-466/

ABDBM © Ron Shamir 1


Contents
• Motivation
• PCA algorithms
• Applications
• PCA theory

Some of these slides are taken from


• Karl Booksh Research group
• Tom Mitchell
ABDBM © Ron Shamir • Ron Parr 2
Data Visualization
Example:

• Given 53 blood and urine measurements


(features) from 65 individuals

• How can we visualize the measurements?

ABDBM © Ron Shamir 3


Data Visualization
• Matrix format (65x53)

H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC


A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000
A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000
Instances

A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000


A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000
A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000
A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000
A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000
A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000
A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000

Features

Difficult to see the correlations between the features...


ABDBM © Ron Shamir 4
Data Visualization
• Spectral format (65 pictures, one for each person)
1000
900
800
700
600
Value

500
400
300
200
100
00 10 20 30 40 50 60
measurement
Measurement

Difficult to compare the different patients...


ABDBM © Ron Shamir 5
Data Visualization
• Spectral format (53 pictures, one for each feature)

1.8
1.6
1.4
1.2
H-Bands

1
0.8
0.6
0.4
0.2
0
0 10 20 30 40 50 60 70
Person
Difficult to see the correlations between the features...
ABDBM © Ron Shamir 6
Data Visualization
Bi-variate Tri-variate
550
500 4
450
400 3
C-LDH

M-EPI
350 2
300
250 1
200 0
600
150 500
400 400
100 200 300
C-LDH 200
50 00
100
0 50 150 250 350 450 C-Triglycerides
C-Triglycerides
How can we visualize the other variables???
… difficult to see in 4 or higher dimensional spaces...
ABDBM © Ron Shamir 7
Data Visualization
• Is there a better representation than the coordinate
axes?

• Is it really necessary to show all the 53 dimensions?


– … what if there are strong correlations between
some of the features?

• How could we find


the smallest subspace of the 53-D space that
keeps the most information about the original data?

• A solution: Principal Component Analysis


ABDBM © Ron Shamir 8
Principal Component Analysis

PCA:
Orthogonal projection of data onto lower-dimension
linear space that...
• maximizes variance of projected data (purple line)

• minimizes mean squared distance between data


points and their projections (the blue segments)
ABDBM © Ron Shamir 9
PCA: the idea

• Given data points in a d-dimensional space,


project into lower dimensional space while
preserving as much information as possible
• Eg, find best planar approximation to 3D data
• Eg, find best 12-D approximation to 104-D data

• In particular, choose projection that


minimizes squared error
in reconstructing original data

ABDBM © Ron Shamir 10


The Principal Components
• Vectors originating from the center of mass

• Principal component #1 points


in the direction of the largest variance.

• Each subsequent principal component…


• is orthogonal to the previous ones, and
• points in the directions of the largest
variance of the residual subspace

ABDBM © Ron Shamir 11


2D Gaussian dataset

ABDBM © Ron Shamir 12


1st PCA axis

ABDBM © Ron Shamir 13


2nd PCA axis

ABDBM © Ron Shamir 14


PCA: a sequential algorithm
Given the centered data {x1, …, xm}, compute the principal vectors:
1 m
w1 = arg max ∑ {( w T x i ) 2 } 1st PCA vector
w =1 m
i =1

We maximize the variance of projection of x


k −1
1 m
w k = arg max ∑ {[w T (x i − ∑ w j w Tj x i )]2 } kth PCA vector
w =1 m
i =1 j =1

x’ PCA reconstruction
w
We maximize the
variance of the projection
x
in the residual subspace
w1(w1Tx) w1
w2(w2Tx)
x’=w1(w1Tx)+w2(w2Tx)
ABDBM © Ron Shamir
w2 15
PCA algorithm
• Given data {x1, …, xm}, compute the sample
covariance matrix Σ

1 m 1 m
Σ ∑
m i =1
(xi − x )(xi − x )T where x = ∑ xi
m i =1

• PCA basis vectors = the eigenvectors of Σ

• Larger eigenvalue ⇒ more important eigenvectors


ABDBM © Ron Shamir 16
PCA algorithm
PCA algorithm(X, k): top k eigenvalues/eigenvectors
% X = N × m data matrix,
% … each data point xi = column vector, i=1..m
1 m
• x= ∑
m i =1
xi

• X  subtract mean x from each column vector xi in X

• Σ  X XT … covariance matrix of X
• { λi, ui }i=1..N = eigenvectors/eigenvalues of Σ
... λ1 ≥ λ2 ≥ … ≥ λN

• Return { λi, ui }i=1..k


% top k principal components
ABDBM © Ron Shamir 17
Proof

GOAL:
ABDBM © Ron Shamir 18
x is centered!

ABDBM © Ron Shamir


Justification of Algorithm II
GOAL:

Use Lagrange-multipliers for the constraints.

ABDBM © Ron Shamir


ABDBM © Ron Shamir
PCA Applications
• Data Visualization
• Data Compression
• Noise Reduction
• Data Classification
• …
• In genomics (and in general): a first step
in data exploration: does my data have
inner structure? Is it clusterable?

ABDBM © Ron Shamir 22


23
A PCA result of ALL 21 samples using 7,913 genes.
Red: good prognosis (upper right),
Blue: bad prognosis (lower left).
ABDBM © Ron Shamir 24
Nishimura et al, GIW 03
PROMO demo

ABDBM © Ron Shamir 25


PCA shortcomings

PCA doesn’t know labels

ABDBM © Ron Shamir 26


PCA shortcoming (3)

PCA cannot capture NON-LINEAR structure


ABDBM © Ron Shamir 28
Summary: PCA
– Finds orthonormal basis for data
– Sorts dimensions in order of “importance” = variance
– Discards low importance dimensions

• Uses:
– Get compact description
– View and assess the data
– Ignore noise
– Improve clustering (hopefully)

• Not magic:
– Doesn’t know class labels
– Can only capture linear variations

• One of many tricks to reduce dimensionality!


ABDBM © Ron Shamir
Karl Pearson, father of mathematical
statistics (1857-1936)

Invented PCA in
1901.
Rediscovered
multiple times
in many fields.

ABDBM © Ron Shamir 30

You might also like