0% found this document useful (0 votes)

55 views13 pages

Unit 5

The document discusses dimensionality reduction, a process that reduces the number of variables in high-dimensional data, commonly used in fields like speech recognition and bioinformatics. It outlines advantages such as decreased algorithm complexity and disadvantages like potential data loss, and describes various techniques including Feature Selection, PCA, LDA, and MDS. Additionally, it compares PCA and LDA, emphasizing their applications in data analysis and classification tasks.

Uploaded by

nirmalchinta8923

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views13 pages

Unit 5

Uploaded by

nirmalchinta8923

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

UNIT-5

Dimensionality Reduction

Dimension Of an instance:

It is defined as number of variables in an instance.

Dimensionality Reduction:

 It is the process of reducing the number of variables under consideration by obtaining

a smaller set of principal variables.
 It is commonly used in the fields that deal with high-dimensional data, such as speech
recognition, signal processing, bioinformatics, etc. It can also be used for data
visualization, noise reduction, cluster analysis, etc.

Advantages of Reducing Dimension:

o Decrease the complexity of the algorithm

o Saves the cost of extracting an unnecessary input
o Simple models can be chosen
o Simplifying the knowledge extraction
o Easy to plot and analyze

Disadvantages of Dimensionality Reduction:

o Some data may be lost due to dimensionality reduction.

o In the PCA dimensionality reduction technique, sometimes the principal components

required to consider are unknown.

Types and Approaches of Dimensionality Reduction:

1. Feature Selection:Find k of d dimensions and discard(d-k) dimensions.

Eg. Subset Selection
2. Feature Extraction:Find a new k dimensions that are combinations of d
dimensions.
Eg:Principal Component Analysis(PCA),Linear Discriminant Analysis(LDA)
SUBSET SELECTION:

o It is also known as variable selection,attribute selection,feature selection.

o The smaller subsets of features are chosen from a set of many dimensional
data to represent the model by filtering,wrapping,embedding.

Advantages:

1.Simplification of models

2.Shorter training times

3.Enhanced generalizations

4.To avoid the curse of dimensionality

TYPES:

1.Forward Selection: Start with no variables and add them one by one,,at each step adding
the one that decrease the error the most till any further addition does not decrease the error.

NOTATION:

N:number of input variables

X1,x2,…xn:input variables

Fi :a subset of the set of input variables

E(Fi):Error on the validation sample when only the inputs in Fi are used.

Steps:

1.Set F=Ø and E(F0)=Infinity

2.For i=0,1,…. Repeat until E(Fi+1)>=E(Fi)

a.For all possible input variable xj,train the model with the input variables Fi
U{xj} and calculate E(Fi U{xj} on the validation set

b.Choose the input variable x that causes the least error E(Fi U{xj}),

m=arg minj E(Fi U {xj})

c.Set Fi+1=Fi U{xm}

3.The set Fi is outputted as the best subset

2.Backward Selection:Start with all variables and remove them one by one at each step till
the error becomes minimum.

STEPS:

1.Set F=x1,x2,…,xn and E(F0)=Infinity

2.For i=0,1,…. Repeat until E(Fi+1)>=E(Fi)

a.For all possible input variable xj, train the model with the input variables Fi -
{xj} and calculate E(Fi -{xj} on the validation set

b.Choose the input variable x that causes the least error

E(Fi -{xj}),

m=arg minj E(Fi -{xj})

c.Set Fi+1=Fi -{xm}

3.The set Fi is outputted as the best subset

PRINCIPAL COMPONENT ANALYSIS(PCA):

 The Principal Component Analysis is a popular unsupervised learning technique for

reducing the dimensionality of data.
 It is a statistical process that converts the observations of correlated features into a set
of linearly uncorrelated features with the help of orthogonal transformation.
 These new transformed features are called the Principal components.
 It is one of the popular tools that is used for explanatory data analysis and predictive
modelling.
 It is a technique to draw strong patterns from the given dataset by reducing the
variances and find the lower dimensional surface to project the high dimensional data
Principal Component:

 The Principal Components are a straight line that captures most of the variance of the
data.
 They have a direction and magnitude.
 Principal components are orthogonal projections (perpendicular) of data onto lower-
dimensional space.

APPLICATIONS:

 Image processing and compression

 Movie recommendation system
 Optimizing the power allocation in various communication channels
 Computer vision
 Hiding patterns in high dimensions.

Working of PCA:
1. Normalize the data

Standardize the data before performing PCA. This will ensure that each feature has a mean =
0 and variance = 1.

2. Build the covariance matrix

Construct a square matrix to express the correlation between two or more features in a
multidimensional dataset.

3. Find the Eigenvectors and Eigenvalues

Calculate the eigenvectors/unit vectors and eigenvalues. Eigenvalues are scalars by which we
multiply the eigenvector of the covariance matrix.

4.Sort the eigenvectors in highest to lowest order and select the number of principal
components.

FACTOR ANALYSIS:

 Factor Analysis is a linear statistical model

 It is used to explain the variance among the observed variable and condense a set of
the observed variable into the unobserved variable called factors.
 Observed variables are modelled as a linear combination of factors and error terms
 Factor of latent variable is associated with multiple observed variables,who have
common patterns of responses.
 Each factor explains a particular amount of variance in the observed variables.
 It helps in data interpretations by reducing the number of variables.
 Example :Analysis of Factor in
Types of Factor Analysis:

4types:

 Exploratory Factor analysis(EDA)

 Confirmatory Factor Analysis(CFA)
 Multiple Factor Analysis(MFA)
 Generalized Procrustes Analysis(GPA)

How does Factor Analysis work?

 It is to reduce the number of observed variables and find unobservable variables.

 These variables help the market researcher to conclude the survey.
 This conversion of the observed variables to unobserved variables can be achieved in
two steps.
o Factor Extraction
o Factor Rotation

Factor Analysis vs PCA:

Factor Analysis PCA

 Covariance in data.  the maximum
amount of variance

 does not require  fully orthogonal to

factors to be each other
orthogonal

 linear combination of  linear combination of

the unobserved the observed
variables variables while in
FA

 Labelable and  uninterpretable

interpretable

 Latent variable  Dimensionality

method reduction

MULTIDIMENSIONAL SCALING(MDS:

 It is a multivariate data analysis approach that is used to visualize the similarity

between samples by plotting points in two dimensional plots
 MDS returns an optimal solution to represent the data in a lower dimensional
space,where the number of dimensions k is specified by the analyst
 An MDS algorithm takes as an input data in the dissimilarity matrix,representing
the distances between pairs of objects.
 The input data for MDS is a dissimilarity matrix representing the distances
between pairs of objects.
TYPES OF MDS ALGORITHMS:

2 types

1.Classical multidimensional scaling

It preserves the original distance matrix a points as possible and it is a metric.

2.Non metric multidimensional scaling

It is ordinal value and not a metric for distance matrix

STEPS OF MDS ALGORITHM:

 Assign a number of points to coordinates in n-dimensional space

 Calculate Euclidean distances for all pairs of points.

MDS and PCA:

MDS PCA
To reduce the similar objects To reduce the dimensions of complex
data
Focus on relations among scaled objects Focus to maximize explains variance
N dimensional data into 2dimensional Multidimensional space to the directions
space in similar objects in the of maximum variability using
ndimensional space will be close correlation matrix to analyze the
together on two dimensional plot correlation between data points and
variables

LINEAR DISCRIMINANT ANALYSIS(LDA)

 LDA is a dimensionality reduction technique used as a preprocessing step for

pattern classification and machine learning applications.
 LDA is similar to PCA but LDA in addition finds the axes that maximizes the
separation between multiple classes
 Whenever there is a requirement to separate two or more classes having multiple
features efficiently, the Linear Discriminant Analysis model is considered the
most common technique to solve such classification problems.
 For e.g., if we have two classes with multiple features and need to separate them
efficiently.
 When we classify them using a single feature, then it may show overlapping.

 To overcome the overlapping issue in the classification process, we must increase

the number of features regularly.
 How LDA works?
 linear Discriminant analysis is used as a dimensionality reduction technique in
machine learning, using which we can easily transform a 2-D and 3-D graph into a
1-dimensional plane.
 we have two classes in a 2-D plane having an X-Y axis, and we need to classify
them efficiently.
 As we have already seen in the above example that LDA enables us to draw a
straight line that can completely separate the two classes of the data points.
 Here, LDA uses an X-Y axis to create a new axis by separating them using a
straight line and projecting data onto a new axis.
 Hence, we can maximize the separation between these classes and reduce the 2-D
plane into 1-D.
 To create a new axis, Linear Discriminant Analysis uses the following criteria:
 It maximizes the distance between means of two classes.
 It minimizes the variance within the individual class.
 Using the above two conditions, LDA generates a new axis in such a way that it
can maximize the distance between the means of the two classes and minimizes
the variation within each class.

APPLICATIONS OF LDA:

 Face recognition
 Medical
 Customer Identification
 For Prediction and Learning

PCA LDA
Does not care about classes It cares classes or labels
It find to maximize the variance in The axes that maximize separation
given dataset between different classes of data.
It takes a small sample size It performs multi class classification
tasks
GAUSSIAN DISCRIMINANT ANALYSIS
 Gaussian Discriminant Analysis is a Generative Learning Algorithm that aims to
determine the distribution of every class.
 It attempts to create the Gaussian distribution to each category of data in a
separate way.
 The likelihood of an outcome in the case using an algorithm known as the
Generative learning algorithm is very high if it is close to the centre of the
contour, which corresponds to its class.
 It diminishes when we move away from the middle of the contour.

 To determine P(X|y), we can use Multivariate Gaussian Distribution to calculate a

probability density equation for every particular class.

 In order to determine P(y) or the class prior for each class, we can make use of the
Bernoulli distribution since all sample data used in binary classification could be 0
or 1.

 So the probability distribution, as well as a class prior to a sample, could be

determined using the general model of Gaussian and Bernoulli distributions:
 In accordance with the principle of Likelihood estimation, we select the
parameters so as to increase the probability function, as shown in Equation 4.

 Instead of maximizing the Likelihood Function, we can boost the Log-Likelihood

Function, a strict growing function.

Thus, Gaussian Discriminant Analysis works extremely well with a limited volume of
data and may be more robust than Logistic Regression if our fundamental assumptions
regarding data distribution are correct.

It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
Unit 4
No ratings yet
Unit 4
17 pages
Module 4
No ratings yet
Module 4
48 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Dimensionality Reduction & Models
No ratings yet
Dimensionality Reduction & Models
59 pages
Unit-4 ML
No ratings yet
Unit-4 ML
17 pages
ML Unit 3
No ratings yet
ML Unit 3
29 pages
Unit-4 ML
No ratings yet
Unit-4 ML
19 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
19 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
ML Unit 4
No ratings yet
ML Unit 4
10 pages
ML Unit-4
0% (1)
ML Unit-4
17 pages
Dimensionality Reduction: Key Concepts
No ratings yet
Dimensionality Reduction: Key Concepts
13 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
Unit - 4
No ratings yet
Unit - 4
76 pages
Lecture W12ab
No ratings yet
Lecture W12ab
60 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Unit 4
No ratings yet
Unit 4
33 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
UNIT-4 Machine Learning
No ratings yet
UNIT-4 Machine Learning
20 pages
Data Reduction and Visualization Techniques
No ratings yet
Data Reduction and Visualization Techniques
22 pages
Dimensionality Reduction Techniques in ML
No ratings yet
Dimensionality Reduction Techniques in ML
39 pages
Principal Component Analysis (PCA) and Linear Discriminant Analysis For Image Recognition
No ratings yet
Principal Component Analysis (PCA) and Linear Discriminant Analysis For Image Recognition
17 pages
Dimensionality Reduction-PCA FA LDA
No ratings yet
Dimensionality Reduction-PCA FA LDA
12 pages
PCA and LDA Assignment
No ratings yet
PCA and LDA Assignment
5 pages
Machine Learning Dimensionality Reduction Techniques
No ratings yet
Machine Learning Dimensionality Reduction Techniques
79 pages
Machine Learning Subset Selection Techniques
No ratings yet
Machine Learning Subset Selection Techniques
4 pages
Unit 3
No ratings yet
Unit 3
102 pages
Deep Learning Notes III To IV
No ratings yet
Deep Learning Notes III To IV
22 pages
Unit 3: Discriminant Analysis and Cluster Analysis
No ratings yet
Unit 3: Discriminant Analysis and Cluster Analysis
43 pages
Mod2 Dimensionality Reduction
No ratings yet
Mod2 Dimensionality Reduction
18 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
104 pages
8 Dimensionality Reduction
No ratings yet
8 Dimensionality Reduction
49 pages
Dimensionality Reduction in Hospital Management
No ratings yet
Dimensionality Reduction in Hospital Management
4 pages
PCA, LDA, and Clustering Explained
No ratings yet
PCA, LDA, and Clustering Explained
51 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
41 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
82 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
33 pages
PCA vs LDA for Dimensionality Reduction
No ratings yet
PCA vs LDA for Dimensionality Reduction
5 pages
Data Reduction Techniques in PCA
No ratings yet
Data Reduction Techniques in PCA
36 pages
Machine Learning Notes Anna University
No ratings yet
Machine Learning Notes Anna University
21 pages
Feature Extraction in Machine Learning
No ratings yet
Feature Extraction in Machine Learning
17 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
30 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
17 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
28 pages
Unit 4 - ML (NEW)
No ratings yet
Unit 4 - ML (NEW)
80 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
17 pages
Crypto Guide
No ratings yet
Crypto Guide
4 pages
Enzymes in Marine Biotechnology
No ratings yet
Enzymes in Marine Biotechnology
88 pages
Saludo vs. TWA & PAL: Cargo Liability
No ratings yet
Saludo vs. TWA & PAL: Cargo Liability
2 pages
Plant Types and Characteristics Overview
No ratings yet
Plant Types and Characteristics Overview
9 pages
Bettervanillabuilding User Guide: Current Pack Version: 2.15
No ratings yet
Bettervanillabuilding User Guide: Current Pack Version: 2.15
11 pages
Audit Q-2 - 1688474 - 2025 - 07 - 25 - 19 - 52
No ratings yet
Audit Q-2 - 1688474 - 2025 - 07 - 25 - 19 - 52
16 pages
Peer Observation Report: Adding Fractions
No ratings yet
Peer Observation Report: Adding Fractions
4 pages
A Technical Report On Student Industrial Work Experience Scheme
89% (9)
A Technical Report On Student Industrial Work Experience Scheme
51 pages
Karnataka SSLC Second Language Exam Guide
No ratings yet
Karnataka SSLC Second Language Exam Guide
4 pages
ISOAG May 5 2021
No ratings yet
ISOAG May 5 2021
91 pages
Religious Priming and Moral Hypocrisy
No ratings yet
Religious Priming and Moral Hypocrisy
8 pages
Boiler Installation & User Guide
No ratings yet
Boiler Installation & User Guide
22 pages
Kotlin Docs
No ratings yet
Kotlin Docs
437 pages
Randi Ramlan: IT Business Analyst Profile
No ratings yet
Randi Ramlan: IT Business Analyst Profile
1 page
Pune Citibank MphasiS Call Center Fraud Assesment 1 Cyber 2nd Sem
No ratings yet
Pune Citibank MphasiS Call Center Fraud Assesment 1 Cyber 2nd Sem
4 pages
Canada Is Expanding Its Co2 Pipeline Network Figure 01
No ratings yet
Canada Is Expanding Its Co2 Pipeline Network Figure 01
1 page
Energy System Final Notes
No ratings yet
Energy System Final Notes
26 pages
AOP IN 2023v1
No ratings yet
AOP IN 2023v1
21 pages
Gambang Union of Tertiary Students Constitution and Bylaws
No ratings yet
Gambang Union of Tertiary Students Constitution and Bylaws
14 pages
NN5-Listening Quiz 1
100% (3)
NN5-Listening Quiz 1
20 pages
Structure and Function of Laminin: Anatomy Ofa Multidomain Glycoprotein
No ratings yet
Structure and Function of Laminin: Anatomy Ofa Multidomain Glycoprotein
13 pages
CND Certification Sample Questions
No ratings yet
CND Certification Sample Questions
2 pages
Case Study #1, Yoga Therapy
No ratings yet
Case Study #1, Yoga Therapy
4 pages
Database Systems Lab Report - SZABIST
No ratings yet
Database Systems Lab Report - SZABIST
2 pages
Morgan Motor Company: Tradition Meets Innovation
No ratings yet
Morgan Motor Company: Tradition Meets Innovation
4 pages
Nonlabor Costs Creation Guide
No ratings yet
Nonlabor Costs Creation Guide
13 pages
Hypothesis Testing for Mean Differences
No ratings yet
Hypothesis Testing for Mean Differences
14 pages
BRC Audit Report: H&H Distribution
100% (5)
BRC Audit Report: H&H Distribution
18 pages
The One That Got Away With Murder
No ratings yet
The One That Got Away With Murder
22 pages
Tamil Nadu Exam Prep Guide
No ratings yet
Tamil Nadu Exam Prep Guide
35 pages

Unit 5

Uploaded by

Unit 5

Uploaded by

UNIT-5

It is defined as number of variables in an instance.

 It is the process of reducing the number of variables under consideration by obtaining

Advantages of Reducing Dimension:

o Decrease the complexity of the algorithm

Disadvantages of Dimensionality Reduction:

o Some data may be lost due to dimensionality reduction.

o In the PCA dimensionality reduction technique, sometimes the principal components

Types and Approaches of Dimensionality Reduction:

1. Feature Selection:Find k of d dimensions and discard(d-k) dimensions.

o It is also known as variable selection,attribute selection,feature selection.

2.Shorter training times

4.To avoid the curse of dimensionality

N:number of input variables

Fi :a subset of the set of input variables

1.Set F=Ø and E(F0)=Infinity

2.For i=0,1,…. Repeat until E(Fi+1)>=E(Fi)

m=arg minj E(Fi U {xj})

c.Set Fi+1=Fi U{xm}

1.Set F=x1,x2,…,xn and E(F0)=Infinity

2.For i=0,1,…. Repeat until E(Fi+1)>=E(Fi)

b.Choose the input variable x that causes the least error

m=arg minj E(Fi -{xj})

c.Set Fi+1=Fi -{xm}

3.The set Fi is outputted as the best subset

PRINCIPAL COMPONENT ANALYSIS(PCA):

 The Principal Component Analysis is a popular unsupervised learning technique for

 Image processing and compression

2. Build the covariance matrix

3. Find the Eigenvectors and Eigenvalues

 Factor Analysis is a linear statistical model

 Exploratory Factor analysis(EDA)

How does Factor Analysis work?

 It is to reduce the number of observed variables and find unobservable variables.

Factor Analysis vs PCA:

Factor Analysis PCA

 does not require  fully orthogonal to

 linear combination of  linear combination of

 Labelable and  uninterpretable

 Latent variable  Dimensionality

 It is a multivariate data analysis approach that is used to visualize the similarity

1.Classical multidimensional scaling

It preserves the original distance matrix a points as possible and it is a metric.

2.Non metric multidimensional scaling

It is ordinal value and not a metric for distance matrix

STEPS OF MDS ALGORITHM:

 Assign a number of points to coordinates in n-dimensional space

MDS and PCA:

LINEAR DISCRIMINANT ANALYSIS(LDA)

 LDA is a dimensionality reduction technique used as a preprocessing step for

 To overcome the overlapping issue in the classification process, we must increase

 To determine P(X|y), we can use Multivariate Gaussian Distribution to calculate a

 So the probability distribution, as well as a class prior to a sample, could be

 Instead of maximizing the Likelihood Function, we can boost the Log-Likelihood

You might also like