Unit 5 (Dimensionality Reduction)

The document discusses dimensionality reduction techniques, emphasizing the importance of feature selection methods such as filter, wrapper, and embedded methods. It highlights Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as key techniques for reducing data dimensionality while retaining information. Additionally, it provides tips for effective feature selection, including leveraging domain knowledge and using regularization techniques.

Uploaded by

Smrutee Behera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views96 pages

Unit 5 (Dimensionality Reduction)

Uploaded by

Smrutee Behera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

• The number of input features, variables, or columns present in a given

dataset is known as dimensionality, and the process to reduce these features

is called dimensionality reduction.

• A dataset contains a huge number of input features in various cases, which

makes the predictive modeling task more complicated, for such cases,
dimensionality reduction techniques are required to use.
Wrapper Method
Embedded Method:

• Feature selection using the embedded method combines the

advantages of both filter and wrapper methods.
• In this approach, feature selection occurs as part of the model
training process.
• The model itself selects the most relevant features during training,
based on their contribution to model performance.
• This method is particularly efficient and effective when working
with regularization techniques.
Feature Selection Methods: Useful Tricks & Tips

Here are some useful tricks and tips for feature selection:
• Understand Your Data: Before selecting features, thoroughly understand your
dataset. Know the domain and the relationships between different features.
• Filter Methods: Use statistical measures like correlation, chi-square, or mutual
information to rank features based on their relevance to the target variable.
• Wrapper Methods: Employ algorithms like Recursive Feature Elimination (RFE) or
Forward/Backward Selection, which select subsets of features based on the
performance of a specific machine learning algorithm.
• Embedded Methods: Some machine learning algorithms inherently perform feature
selection during training. Examples include LASSO (L1 regularization) and tree-
based methods like Random Forests.
*LASSO (Least Absolute Shrinkage and Selection Operator) regression is a type of linear regression that
incorporates regularization to prevent overfitting and enhance model interpretability.
• Dimensionality Reduction: Techniques like Principal Component Analysis (PCA) or
t-distributed Stochastic Neighbor Embedding (t-SNE) can reduce the dimensionality of
your data while retaining most of the information.
• Feature Importance: For tree-based algorithms like Random Forest or Gradient
Boosting Machines (GBM), you can use the built-in feature importance attribute to
select the most important features.
• Domain Knowledge: Leverage domain expertise to identify features that are likely to
be important. Sometimes, features that seem irrelevant on the surface might be crucial
when considering domain-specific insights.
• Regularization: Regularization techniques like LASSO (L1 regularization) penalize
the absolute size of the coefficients, effectively performing feature selection by driving
some coefficients to zero.
• Cross-Validation: Perform feature selection within each fold of cross-validation to
ensure that your feature selection process is not biased by the specific dataset splits.
• Ensemble Methods: Combine the results of multiple feature selection methods to get a
more robust set of selected features.
• This feature extraction method reduces the dimensionality of large
data sets while preserving the maximum amount of information.

• Principal Component Analysis emphasizes variation and captures

important patterns and relationships between variables in the
dataset.
How Does Principal Component Analysis (PCA)
Work?
• In general, all the features are not equally important and there are certain
features that account for a large percentage of variance in the dataset.
• The motivation behind the PCA algorithm is that there are certain features
that capture a large percentage of variance in the original dataset.
• So it's important to find the directions of maximum variance in the
dataset.
• These directions are called principal components.
• And PCA is essentially a projection of the dataset onto the principal
components.
• So how do we find the principal components?
What is Covariance Matrix?

• The variance-covariance matrix is a square matrix with diagonal elements that represent
the variance and the non-diagonal components that express covariance.
• The covariance of a variable can take any real value- positive, negative, or zero.
• A positive covariance suggests that the two variables have a positive relationship, whereas a
negative covariance indicates that they do not.
• If two elements do not vary together, they have a zero covariance.
Example: Find the covariance matrix
Example: 2 Find the Eigen Values and Eigen Vector for 3 X 3 Matrix

𝟐 𝟏 𝟑
𝑨= 𝟏 𝟐 𝟑
𝟑 𝟑 𝟐𝟎

Sol:
• Singular Value Decomposition is a matrix factorization technique widely used in various
applications, including linear algebra, signal processing, and machine learning.
• It decomposes a matrix into three other matrices, allowing for the representation of the
original matrix in a reduced form.
Decomposition of Matrix:

Given a matrix M of size m x n (or a data frame with m rows and n columns),
SVD decomposes it into three matrices:

M = U *Σ *Vᵗ,

where U is an m x m orthogonal matrix, Σ is an m x r diagonal matrix, and

V is an r x n orthogonal matrix.
r is the rank of the matrix M
• The diagonal elements of Σ are the singular values of the original matrix M,
and they are arranged in descending order.
• The columns of U are the left singular vectors of M. These vectors form an
orthogonal basis for the column space of M.
• The columns of V are the right singular vectors of M.
In summary,
• PCA is suitable for unsupervised dimensionality reduction,
• LDA is effective for supervised problems with a focus on class separability, and
• SVD is versatile, catering to various applications including collaborative filtering and matrix
factorization.
The choice depends on the nature of your data and the goals of your analysis.

Day School 03
No ratings yet
Day School 03
32 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
9 pages
Feature Extraction in Machine Learning
No ratings yet
Feature Extraction in Machine Learning
17 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
30 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
82 pages
Unit 3
No ratings yet
Unit 3
55 pages
Feature Extraction and Selection in ML
No ratings yet
Feature Extraction and Selection in ML
15 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
37 pages
Lesson 7-Feature Selection and Principal Component Analysis
No ratings yet
Lesson 7-Feature Selection and Principal Component Analysis
24 pages
Dimensionality Reduction Final
No ratings yet
Dimensionality Reduction Final
5 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
30 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
MCA Class Note Feature
No ratings yet
MCA Class Note Feature
5 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
28 pages
Feature Extraction and Selection in ML
No ratings yet
Feature Extraction and Selection in ML
50 pages
SVD and PCA in Data Science
No ratings yet
SVD and PCA in Data Science
58 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
19 pages
Dimensionality Reduction and PCA Guide
No ratings yet
Dimensionality Reduction and PCA Guide
28 pages
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
No ratings yet
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
9 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Lecture 15 - 23.09.2024 - Feature Selection
No ratings yet
Lecture 15 - 23.09.2024 - Feature Selection
47 pages
ASM-BDM - Module 3 - Notes
No ratings yet
ASM-BDM - Module 3 - Notes
12 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Dimensionality Reduction & Models
No ratings yet
Dimensionality Reduction & Models
59 pages
PDSLab Manual EXP7
No ratings yet
PDSLab Manual EXP7
6 pages
Importance of Dimensionality Reduction in ML
No ratings yet
Importance of Dimensionality Reduction in ML
18 pages
Machine Learning Subset Selection Techniques
No ratings yet
Machine Learning Subset Selection Techniques
4 pages
What Is Principal Component Analysis (PCA) ?
No ratings yet
What Is Principal Component Analysis (PCA) ?
13 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Feature Extraction Techniques Guide
No ratings yet
Feature Extraction Techniques Guide
16 pages
Unit 3 - MSC
No ratings yet
Unit 3 - MSC
51 pages
ML Unit 2 Part - 2
No ratings yet
ML Unit 2 Part - 2
6 pages
Feature Selection & Extraction in ML
No ratings yet
Feature Selection & Extraction in ML
15 pages
Understanding the Curse of Dimensionality
No ratings yet
Understanding the Curse of Dimensionality
11 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
17 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
16 pages
Dimensionality Reduction: Key Concepts
No ratings yet
Dimensionality Reduction: Key Concepts
13 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
104 pages
Basic Theory
No ratings yet
Basic Theory
4 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
7 pages
Unit 4
No ratings yet
Unit 4
17 pages
Data Reduction and Visualization Techniques
No ratings yet
Data Reduction and Visualization Techniques
22 pages
Dimensionality Reduction DR
No ratings yet
Dimensionality Reduction DR
31 pages
Data Analysis Techniques Overview
No ratings yet
Data Analysis Techniques Overview
7 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
16 pages
Unit-4 ML
No ratings yet
Unit-4 ML
17 pages
Principal Component Analysis Explained
No ratings yet
Principal Component Analysis Explained
12 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
27 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Ecostar: Save This Instruction Manual
No ratings yet
Ecostar: Save This Instruction Manual
36 pages
UK Rainwater Harvesting Systems: References - Bradford University
No ratings yet
UK Rainwater Harvesting Systems: References - Bradford University
25 pages
ELEKTRA UserManual en 17
No ratings yet
ELEKTRA UserManual en 17
718 pages
Chemical Engineering Graduate Profile
No ratings yet
Chemical Engineering Graduate Profile
2 pages
Cover Letter For Nasa
100% (2)
Cover Letter For Nasa
4 pages
G12 Humss Reviewer
No ratings yet
G12 Humss Reviewer
14 pages
Thermodynamics Question Bank
No ratings yet
Thermodynamics Question Bank
2 pages
Data Science & ML Engineer Profile
No ratings yet
Data Science & ML Engineer Profile
2 pages
KGBV Furniture and Equipment List
No ratings yet
KGBV Furniture and Equipment List
1 page
Service Request & Extended Service Request
No ratings yet
Service Request & Extended Service Request
1 page
HP Envy 17 (Quanta SP8) Schematic
100% (1)
HP Envy 17 (Quanta SP8) Schematic
44 pages
CNG Dispenser Service Manual v.1.1.6
No ratings yet
CNG Dispenser Service Manual v.1.1.6
129 pages
DAC Curve
No ratings yet
DAC Curve
8 pages
Company Profile
No ratings yet
Company Profile
14 pages
Rural Services in Ethiopia
No ratings yet
Rural Services in Ethiopia
89 pages
Double Cabin (RESCUE)
No ratings yet
Double Cabin (RESCUE)
8 pages
IFRS E-Learning for Finance Pros
No ratings yet
IFRS E-Learning for Finance Pros
8 pages
Real-World Examples of Attacks Detected by Honeypots: USENIX Security Symposium
No ratings yet
Real-World Examples of Attacks Detected by Honeypots: USENIX Security Symposium
2 pages
What We Say Matters Practicing Nonviolent Communication ISBN 1645471047, 9781645471042 Full Book Download
No ratings yet
What We Say Matters Practicing Nonviolent Communication ISBN 1645471047, 9781645471042 Full Book Download
14 pages
2bhk Modern Interior Estimate
No ratings yet
2bhk Modern Interior Estimate
8 pages
Addressing Cover Letters Right
100% (1)
Addressing Cover Letters Right
8 pages
MPPCB - Madhya Pradesh
No ratings yet
MPPCB - Madhya Pradesh
2 pages
Olmec Civilization: Origins and Legacy
0% (1)
Olmec Civilization: Origins and Legacy
7 pages
A55M-HVS multiQIG
No ratings yet
A55M-HVS multiQIG
162 pages
Yoga For Depression A Compassionate Guide To Relieve Suffering Through Yoga by Amy Weintraub PDF
0% (2)
Yoga For Depression A Compassionate Guide To Relieve Suffering Through Yoga by Amy Weintraub PDF
1 page
Payroll Timekeeping System
50% (2)
Payroll Timekeeping System
94 pages
Tooth Wear Evaluation System (TWES) 2.0 - Reliability of Diagnosis With and Without Computer Assisted Evaluation
No ratings yet
Tooth Wear Evaluation System (TWES) 2.0 - Reliability of Diagnosis With and Without Computer Assisted Evaluation
12 pages
Topic-Modernisation: Saurabh Sharma Disha Sharma Ritul Jangid Paras Joshi Sonali Thakur Mohit Kaushik
No ratings yet
Topic-Modernisation: Saurabh Sharma Disha Sharma Ritul Jangid Paras Joshi Sonali Thakur Mohit Kaushik
7 pages
03-00 - Adjective Phrases - English Grammar Today - Cambridge Dictionary
No ratings yet
03-00 - Adjective Phrases - English Grammar Today - Cambridge Dictionary
5 pages
2011 Equipment Flyer 05272011updated
No ratings yet
2011 Equipment Flyer 05272011updated
24 pages