• The number of input features, variables, or columns present in a given
dataset is known as dimensionality, and the process to reduce these features
is called dimensionality reduction.
• A dataset contains a huge number of input features in various cases, which
makes the predictive modeling task more complicated, for such cases,
dimensionality reduction techniques are required to use.
Wrapper Method
Embedded Method:
• Feature selection using the embedded method combines the
advantages of both filter and wrapper methods.
• In this approach, feature selection occurs as part of the model
training process.
• The model itself selects the most relevant features during training,
based on their contribution to model performance.
• This method is particularly efficient and effective when working
with regularization techniques.
Feature Selection Methods: Useful Tricks & Tips
Here are some useful tricks and tips for feature selection:
• Understand Your Data: Before selecting features, thoroughly understand your
dataset. Know the domain and the relationships between different features.
• Filter Methods: Use statistical measures like correlation, chi-square, or mutual
information to rank features based on their relevance to the target variable.
• Wrapper Methods: Employ algorithms like Recursive Feature Elimination (RFE) or
Forward/Backward Selection, which select subsets of features based on the
performance of a specific machine learning algorithm.
• Embedded Methods: Some machine learning algorithms inherently perform feature
selection during training. Examples include LASSO (L1 regularization) and tree-
based methods like Random Forests.
*LASSO (Least Absolute Shrinkage and Selection Operator) regression is a type of linear regression that
incorporates regularization to prevent overfitting and enhance model interpretability.
• Dimensionality Reduction: Techniques like Principal Component Analysis (PCA) or
t-distributed Stochastic Neighbor Embedding (t-SNE) can reduce the dimensionality of
your data while retaining most of the information.
• Feature Importance: For tree-based algorithms like Random Forest or Gradient
Boosting Machines (GBM), you can use the built-in feature importance attribute to
select the most important features.
• Domain Knowledge: Leverage domain expertise to identify features that are likely to
be important. Sometimes, features that seem irrelevant on the surface might be crucial
when considering domain-specific insights.
• Regularization: Regularization techniques like LASSO (L1 regularization) penalize
the absolute size of the coefficients, effectively performing feature selection by driving
some coefficients to zero.
• Cross-Validation: Perform feature selection within each fold of cross-validation to
ensure that your feature selection process is not biased by the specific dataset splits.
• Ensemble Methods: Combine the results of multiple feature selection methods to get a
more robust set of selected features.
• This feature extraction method reduces the dimensionality of large
data sets while preserving the maximum amount of information.
• Principal Component Analysis emphasizes variation and captures
important patterns and relationships between variables in the
dataset.
How Does Principal Component Analysis (PCA)
Work?
• In general, all the features are not equally important and there are certain
features that account for a large percentage of variance in the dataset.
• The motivation behind the PCA algorithm is that there are certain features
that capture a large percentage of variance in the original dataset.
• So it's important to find the directions of maximum variance in the
dataset.
• These directions are called principal components.
• And PCA is essentially a projection of the dataset onto the principal
components.
• So how do we find the principal components?
What is Covariance Matrix?
• The variance-covariance matrix is a square matrix with diagonal elements that represent
the variance and the non-diagonal components that express covariance.
• The covariance of a variable can take any real value- positive, negative, or zero.
• A positive covariance suggests that the two variables have a positive relationship, whereas a
negative covariance indicates that they do not.
• If two elements do not vary together, they have a zero covariance.
Example: Find the covariance matrix
Example: 2 Find the Eigen Values and Eigen Vector for 3 X 3 Matrix
𝟐 𝟏 𝟑
𝑨= 𝟏 𝟐 𝟑
𝟑 𝟑 𝟐𝟎
Sol:
• Singular Value Decomposition is a matrix factorization technique widely used in various
applications, including linear algebra, signal processing, and machine learning.
• It decomposes a matrix into three other matrices, allowing for the representation of the
original matrix in a reduced form.
Decomposition of Matrix:
Given a matrix M of size m x n (or a data frame with m rows and n columns),
SVD decomposes it into three matrices:
M = U *Σ *Vᵗ,
where U is an m x m orthogonal matrix, Σ is an m x r diagonal matrix, and
V is an r x n orthogonal matrix.
r is the rank of the matrix M
• The diagonal elements of Σ are the singular values of the original matrix M,
and they are arranged in descending order.
• The columns of U are the left singular vectors of M. These vectors form an
orthogonal basis for the column space of M.
• The columns of V are the right singular vectors of M.
In summary,
• PCA is suitable for unsupervised dimensionality reduction,
• LDA is effective for supervised problems with a focus on class separability, and
• SVD is versatile, catering to various applications including collaborative filtering and matrix
factorization.
The choice depends on the nature of your data and the goals of your analysis.