Open navigation menu

Scribd

0% found this document useful (0 votes)

47 views74 pages

PCA ChrisDing4

Principal component analysis (PCA) is a procedure to reduce the dimensionality of data by finding a new set of variables called principal components. PCA retains most of the variation in the original data. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. The principal components are eigenvectors of the data's covariance matrix and represent orthogonal directions with maximum variance in the data.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views74 pages

PCA ChrisDing4

Principal component analysis (PCA) is a procedure to reduce the dimensionality of data by finding a new set of variables called principal components. PCA retains most of the variation in the original data. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. The principal components are eigenvectors of the data's covariance matrix and represent orthogonal directions with maximum variance in the data.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Principal Component Analysis

Chris Ding
Department of Computer Science and Engineering
University of Texas at Arlington
PCA is the procedure of finding intrinsic
dimensions of the data

[Link] analysis
[Link] reduction
[Link] visualization

Represent high dimensional data in low-dim space

High-dimensional data

Gene expression Face images Handwritten digits

Example…
Application of feature reduction

• Face recognition
• Handwritten digit recognition
• Text mining
• Image retrieval
• Microarray data analysis
• Protein classification
Use PCA to approximate an image (a data matrix)

112 x 92

PCA PCA PCA PCA

original
k=10 k=20 k=30 k=40
Use PCA to approximate a set of images

original

PCA k=1

PCA k=2

PCA k=4

PCA k=6
Use PCA to approximate a set of images

original

PCA k=1

PCA k=2

PCA k=4

PCA k=6
Display the characters in 2-dim space

 a T
x
~x  G x   T 
T 1

 a2 x 
Application of feature reduction
Intrinsic dimensions of the data
Samples of children: hours of study, hours on internet, vs their age

Hours on study / homework

Children’s age

Hours on internet
Intrinsic dimensions of the data
Samples of children: hours of study, hours on internet, vs their age

Hours on study / homework

Children’s age

Data lie in a subspace

(intrinsic dimensions)

Hours on internet
PCA is the procedure of finding
intrinsic dimensions of the data
Find lines that best represent the data
PCA is a rotation of space to proper
directions (principal directions)
Geometric picture of principal components (PCs)

z1

• the 1st PC z1 is a minimum distance fit to a line in X space

• the 2nd PC z 2 is a minimum distance fit to a line in the plane
perpendicular to the 1st PC
PCs are a series of linear least squares fits to a sample,
each orthogonal to all the previous.
PCA represents data:
the close data to a linear subspace,
the more accurate representation
PCA Step 0: move coordinate to data center
This is equivalent to Centering the data

Hours on study / homework

Children’s age

Hours on internet
PCA Step 1: find a line that best represents the data

Hours on study / homework

Children’s age

Hours on internet
PCA Step 1: find a line that best represents the data

Hours on study / homework

Children’s age

Hours on internet
PCA Step 1: find a line that best represents the data

Hours on study / homework

Children’s age

Hours on internet
PCA Step 1: find a line that best represents the data

Hours on study / homework

Children’s age

Hours on internet
PCA Step 1: find a line that best represents the data

Hours on study / homework

Children’s age

projection error

Hours on internet

minimize sum of projection errors squared

Which error to minimize?
PCA Step 1: find the line that best represents the data
Fitting data to a curve (a straight line, the simplest curve)
Hours on study / homework

Children’s age

Hours on internet

minimize sum of projection errors squared

This gives the 1st principal direction
PCA directions are eigenvectors of Covariance Matrix
Repeating this process to find 3rd, 4th, … lines to best fit the remaining data,
the results are given by u2 ， u3 ，… , uk
Intrinsic dimensions of the data
Samples of children study, use internet vs their age

Hours on study / homework

Children’s age

Hours on internet
PCA from maximum variance

PCA from maximum spread-out

PCA represents data:
the close data to a linear subspace,
the more accurate representation

smaller
variance

Larger
variance
Larger spread-out = Larger variance
What is Principal Component Analysis?

• Principal component analysis (PCA)

– Reduce the dimensionality of a data set by finding a new set of
variables, smaller than the original set of variables
– Retains most of the sample's information.
– Useful for the compression and classification of data.

• By information we mean the variation present in the sample,

given by the correlations between the original variables.
– The new variables, called principal components (PCs), are
uncorrelated, and are ordered by the fraction of the total information
each retains.
Geometric picture of principal components (PCs)

z1

• the 1st PC z1 is a minimum distance fit to a line in X space

• the 2nd PC z 2 is a minimum distance fit to a line in the plane
perpendicular to the 1st PC
PCs are a series of linear least squares fits to a sample,
each orthogonal to all the previous.
C. Ding

Principal Component as maximum variance

Let x  (x 1 ,x 2 , ,x p )T be a vector random variable in p dimensions/variables
随机变量
(e1 ,e2 , ,e p )
Given n observations/samples of x:
x1 , x2 ,, xn   p

The first principal component.

Define a scalar random variable as linear combination of dimensions:
p
z1  a1T x   j1 ,
a
j 1
x j
a1  (a11 ,a21 , ,a p 1 )

var[ z1 ] is maximized
Principal Component as maximum variance
Because
1
 
n
2
var[ z1 ]  E (( z1  z1 ) 2 )   a1T xi  a1T x
n i 1
1 n T
 
T

  a1 xi  x xi  x a1  a1T Sa1
n i 1
where
1 n

S   xi  x xi  x
n i 1
T
 
1 n
is the covariance matrix. x   xi is the mean.
n i 1
In the following, we assume the data is centered. x  0
Principal Component as maximum variance

To find a1 that T
maximizes var[ z1 ] subject to a1 a1  1

Let λ be a Lagrange multiplier.

L  a1T Sa1  (a1T a1  1)


L  Sa1  a1  0
a1 “eigen” = german “do something to itself”
Sa1  a1 operator = matrix

therefore a1 is an eigenvector of S

corresponding to the largest eigenvalue   1.

Algebraic derivation of PCs

To find the next coefficient vector a2 maximizing var[ z2 ]

subject to cov[ z 2 , z1 ]  0 uncorrelated
and to a2T a2  1
T T
First note that cov[ z 2 , z1 ]  a Sa2   a a2
1 1 1

then let λ and φ be Lagrange multipliers, and maximize

T T T
L  a Sa2   (a a2  1)  a a
2 2 2 1
Algebraic derivation of PCs

T T T
L  a Sa2   (a a2  1)  a a
2 2 2 1


L  Sa2  a2  a1  0    0
a2

T
Sa2  a2 and   a Sa2 2
Algebraic derivation of PCs

We find that a2 is also an eigenvector of S

whose eigenvalue   2 is the second largest.
In general
T
var[ z k ]  a Sak  k
k

• The kth largest eigenvalue of S is the variance of the kth PC.

• The kth PC z k retains the kth greatest fraction of the variation

in the sample.
C. Ding
Projection to PCA subspace
• Main steps for computing PCA subspace
– Form the covariance matrix S.

– Compute its eigenvectors:

a  p
i i 1

– PCA subspace is spanned by the first d eigenvectors a d

i i 1

– The transformation G is given by  a1T x 

 
T
G  [a1 , a2 , , ad ]
~x  G x   a2 x 
T

U  (u1 ,u 2 , ,u k ) 
 T 
p T  ad x 
x    x~  G x  PCAsubspace
Algebraic derivation of PCs

Assume x0
p n
Form the matrix: X  [ x1 , x2 , , xn ]  
1
then S  XX T
n

Obtain eigenvectors of S by computing the SVD of X:

T
X  UV X’ = U^T * X = \Sigma * V’
Homework:

After you
[Link] the covariance matrix S
[Link] the first k eigenvectors of S as (u_1, …, u_k)

Show that:
You can obtain (v_1,…,v_k) by doing matrix – vector
multiplications. No need to compute eigenvectors of the
kernel (Gram) matrix.
Reduction and Reconstruction Reconstruction
Dimension reduction X   p n  G T X   d  n
G T X   d n  X  G (G T X )   pn

GT  d  p

Y  G T X   d n
X   p n

X   p n
G   p d
Optimality property of PCA
Main theoretical result:
The matrix G consisting of the first d eigenvectors of the
covariance matrix S solves the following min problem:
T 2
min G pd X  G (G X ) subject to G T G  I d
F

2
X X reconstruction error
F

PCA projection minimizes the reconstruction error among all

linear projections of size d.
Applications of PCA

• Eigenfaces for recognition. Turk and Pentland. 1991.

• Principal Component Analysis for clustering gene

expression data. Yeung and Ruzzo. 2001.

• Probabilistic Disease Classification of Expression-

Dependent Proteomic Data from Mass Spectrometry of
Human Serum. Lilien. 2003.
Outline of lecture

• What is feature reduction?

• Why feature reduction?
• Feature reduction algorithms
• Principal Component Analysis
• Nonlinear PCA using Kernels
Motivation

Linear projections
will not detect the
pattern.
Nonlinear PCA using Kernels

• Traditional PCA applies linear transformation

– May not be effective for nonlinear data

• Solution: apply nonlinear transformation to potentially very high-

dimensional space.

 : x   ( x)
• Computational efficiency: apply the kernel trick.
– Require PCA can be rewritten in terms of dot product.

More on kernels
K ( xi , x j )   ( xi )   ( x j ) later
Nonlinear PCA using Kernels

Rewrite PCA in terms of dot product

Assume the data has been centered, i.e.,  xi  0.

i
1
The covariance matrix S can be written as S   xi xiT
n i

Let v be The eigenvector of S corresponding to

nonzero eigenvalue
1 1
Sv   xi xi v  v  v 
T
 ( x T
i v ) xi
n i n i
Eigenvectors of S lie in the space spanned by all data points.
Nonlinear PCA using Kernels
1 1
Sv   xi xi v  v  v 
T
 ( x T
i v ) xi
n i n i
The covariance matrix can be written in matrix form:
1
S  XX T , where X  [x1 , x 2 , , x n ].
n
v    i xi  X 1
Sv  XX T X  X
i n
1 T
( X X )( X T X )   ( X T X )
n
1 T Any benefits?
( X X )  
n
Nonlinear PCA using Kernels

Next consider the feature space:  : x   ( x)

1   T
S  X X  , where X   [x1 , x 2 ,  , x n ].


n
1
v    i ( xi )  X 
i

 T
X X   
n


The (i,j)-th entry of X  X

 T 
is  ( xi )   ( x j )
Apply the kernel trick: K ( xi , x j )   ( xi )   ( x j )
1
K is called the kernel matrix. K  
n
Nonlinear PCA using Kernels

• Projection of a test point x onto v:

 ( x)  v   ( x)    i ( xi )
i

   i ( x)   ( xi )    i K ( x, xi )
i i

Explicit mapping is not required here.

Reference

• Principal Component Analysis. I.T. Jolliffe.

• Kernel Principal Component Analysis. Schölkopf, et al.

• Geometric Methods for Feature Extraction and

Dimensional Reduction. Burges.
主成分分析（ PCA ） = K 均值聚类 (k-means)

把每个类的数据点集中到类中心 ( 假设每个类大致是球型 )
这 K 个类中心就组成了主成分分析的子空间！
（这可用数学严格证明）

in p-dim space

One early major advance using matrix

analysis
(Zha, He, Ding, et al, NIPS 2000)
(Ding & He, ICML 2004)
C. Ding, NMF for data
68
clustering and
PCA  k-means clustering

- Move every data points to its cluster center

- K cluster centers span a cluster subspace (k-1 dim)
- Cluster-subspace = PCA subspace (1st k-1 PCA directions)

in p-dim space

One early major advance on PCA, K-means (Zha, He, Ding, et al, NIPS 2000)
(Ding & He, ICML 2004)
Solution of K-means is represented by cluster indicators

 H

n1

We actually use scaled indicators: n2

nk
n1

n2

nk
Q TQ  I

You might also like

PCA
100% (1)
PCA
33 pages
Data Pre-Processing-IV (Feature Extraction-PCA)
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)
23 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
20 Pca
No ratings yet
20 Pca
50 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
Module 2-PCA-1
No ratings yet
Module 2-PCA-1
26 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Module3 OTML
No ratings yet
Module3 OTML
67 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
Face Recognition PAC
No ratings yet
Face Recognition PAC
24 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
CAMI16 - Data Analytics
No ratings yet
CAMI16 - Data Analytics
28 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
PCA for Data Simplification
No ratings yet
PCA for Data Simplification
70 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
UploadFile 9116
No ratings yet
UploadFile 9116
21 pages
Prs l6
No ratings yet
Prs l6
10 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
16 pages
Unsupervised Learning: PCA & Clustering
No ratings yet
Unsupervised Learning: PCA & Clustering
96 pages
5 Dimentionality Reduction
No ratings yet
5 Dimentionality Reduction
27 pages
Dimensionality Reduction Explained
No ratings yet
Dimensionality Reduction Explained
60 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Dimensionality Reduction by Pca: Non - Feasible
No ratings yet
Dimensionality Reduction by Pca: Non - Feasible
26 pages
Principal Component Analysis Guide
No ratings yet
Principal Component Analysis Guide
23 pages
PCA: Dimensionality Reduction Explained
No ratings yet
PCA: Dimensionality Reduction Explained
47 pages
MLSP Exp02
No ratings yet
MLSP Exp02
10 pages
Lec 16 PCA
No ratings yet
Lec 16 PCA
64 pages
PCA for Data Scientists
No ratings yet
PCA for Data Scientists
45 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Principal Component Analysis (PCA) : Gundimeda Venugopal
No ratings yet
Principal Component Analysis (PCA) : Gundimeda Venugopal
17 pages
AI Unsupervised Learning Guide
No ratings yet
AI Unsupervised Learning Guide
44 pages
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
No ratings yet
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
22 pages
4.5 Principal Component Analysis
No ratings yet
4.5 Principal Component Analysis
15 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
PCA for Data Scientists
No ratings yet
PCA for Data Scientists
11 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
7 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
28 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Program 3
No ratings yet
Program 3
7 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Advanced Dimension Reduction Techniques
No ratings yet
Advanced Dimension Reduction Techniques
40 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
17 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
Data Analysis: Dr. C Santhosh Kumar
No ratings yet
Data Analysis: Dr. C Santhosh Kumar
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Phil IRI Pretest Report SY 2019-20
No ratings yet
Phil IRI Pretest Report SY 2019-20
6 pages
AI Explainability in Finance Review
No ratings yet
AI Explainability in Finance Review
41 pages
2 Properties of Numerical Techniques
No ratings yet
2 Properties of Numerical Techniques
16 pages
OrganizationS: Their Political, Structural and Economic Environment
100% (6)
OrganizationS: Their Political, Structural and Economic Environment
83 pages
Perfect Cover Letter Template
No ratings yet
Perfect Cover Letter Template
5 pages
Future Progressive: Will Be Sitting in Class. Be Going To + Be + - Ing
No ratings yet
Future Progressive: Will Be Sitting in Class. Be Going To + Be + - Ing
3 pages
Nigeria's Gender Gap in Smallholder Finance
No ratings yet
Nigeria's Gender Gap in Smallholder Finance
10 pages
Survey Questionnaires Small Size
No ratings yet
Survey Questionnaires Small Size
2 pages
The Thoughts of Pancasila in Nahdlatul Ulama and Muhammadiyah in The Era of Reform Indonesia
No ratings yet
The Thoughts of Pancasila in Nahdlatul Ulama and Muhammadiyah in The Era of Reform Indonesia
6 pages
The Key Points in Effective Writing
No ratings yet
The Key Points in Effective Writing
5 pages
How To Use CCOpt Engine To Build A Tree On Data Nets
No ratings yet
How To Use CCOpt Engine To Build A Tree On Data Nets
2 pages
VI
0% (1)
VI
5 pages
Chapter 27 New Challenges
No ratings yet
Chapter 27 New Challenges
42 pages
Radix Purpose Work
No ratings yet
Radix Purpose Work
27 pages
Summary - Doña Bárbara
No ratings yet
Summary - Doña Bárbara
4 pages
Understanding Political Power in America
No ratings yet
Understanding Political Power in America
14 pages
Summer Camp Activity Schedule 2018
No ratings yet
Summer Camp Activity Schedule 2018
3 pages
How To Build A Simple Demand Curve For A Product
100% (3)
How To Build A Simple Demand Curve For A Product
5 pages
Global Sourcing Strategy: R&D, Manufacturing, and Marketing Interfaces
No ratings yet
Global Sourcing Strategy: R&D, Manufacturing, and Marketing Interfaces
30 pages
Defending From The Front
100% (2)
Defending From The Front
5 pages
Topic 8, Managing Service Quality
No ratings yet
Topic 8, Managing Service Quality
11 pages
Newborn Screening Education Plan
No ratings yet
Newborn Screening Education Plan
4 pages
Arts Appreciation Concept Notes 1 10
No ratings yet
Arts Appreciation Concept Notes 1 10
13 pages
Test Bank For Sexuality Today 11th Edition by Kelly
100% (1)
Test Bank For Sexuality Today 11th Edition by Kelly
12 pages
Poem 1 Rose
No ratings yet
Poem 1 Rose
5 pages
Islamic Law on Monopoly Prohibition
No ratings yet
Islamic Law on Monopoly Prohibition
13 pages
Mba - Group 5 - Cooperative Strategy Final
No ratings yet
Mba - Group 5 - Cooperative Strategy Final
21 pages
Rankability in Political Theory
100% (2)
Rankability in Political Theory
358 pages
The Paradigm Shift in Teaching
No ratings yet
The Paradigm Shift in Teaching
20 pages
Guantanamo Bay Thesis Statement
100% (3)
Guantanamo Bay Thesis Statement
5 pages