0% found this document useful (0 votes)
55 views13 pages

Unit 5

The document discusses dimensionality reduction, a process that reduces the number of variables in high-dimensional data, commonly used in fields like speech recognition and bioinformatics. It outlines advantages such as decreased algorithm complexity and disadvantages like potential data loss, and describes various techniques including Feature Selection, PCA, LDA, and MDS. Additionally, it compares PCA and LDA, emphasizing their applications in data analysis and classification tasks.

Uploaded by

nirmalchinta8923
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views13 pages

Unit 5

The document discusses dimensionality reduction, a process that reduces the number of variables in high-dimensional data, commonly used in fields like speech recognition and bioinformatics. It outlines advantages such as decreased algorithm complexity and disadvantages like potential data loss, and describes various techniques including Feature Selection, PCA, LDA, and MDS. Additionally, it compares PCA and LDA, emphasizing their applications in data analysis and classification tasks.

Uploaded by

nirmalchinta8923
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT-5

Dimensionality Reduction

Dimension Of an instance:

It is defined as number of variables in an instance.

Dimensionality Reduction:

 It is the process of reducing the number of variables under consideration by obtaining


a smaller set of principal variables.
 It is commonly used in the fields that deal with high-dimensional data, such as speech
recognition, signal processing, bioinformatics, etc. It can also be used for data
visualization, noise reduction, cluster analysis, etc.

Advantages of Reducing Dimension:

o Decrease the complexity of the algorithm


o Saves the cost of extracting an unnecessary input
o Simple models can be chosen
o Simplifying the knowledge extraction
o Easy to plot and analyze

Disadvantages of Dimensionality Reduction:

o Some data may be lost due to dimensionality reduction.

o In the PCA dimensionality reduction technique, sometimes the principal components


required to consider are unknown.

Types and Approaches of Dimensionality Reduction:

1. Feature Selection:Find k of d dimensions and discard(d-k) dimensions.


Eg. Subset Selection
2. Feature Extraction:Find a new k dimensions that are combinations of d
dimensions.
Eg:Principal Component Analysis(PCA),Linear Discriminant Analysis(LDA)
SUBSET SELECTION:

o It is also known as variable selection,attribute selection,feature selection.


o The smaller subsets of features are chosen from a set of many dimensional
data to represent the model by filtering,wrapping,embedding.

Advantages:

1.Simplification of models

2.Shorter training times

3.Enhanced generalizations

4.To avoid the curse of dimensionality

TYPES:

1.Forward Selection: Start with no variables and add them one by one,,at each step adding
the one that decrease the error the most till any further addition does not decrease the error.

NOTATION:

N:number of input variables

X1,x2,…xn:input variables

Fi :a subset of the set of input variables

E(Fi):Error on the validation sample when only the inputs in Fi are used.

Steps:

1.Set F=Ø and E(F0)=Infinity

2.For i=0,1,…. Repeat until E(Fi+1)>=E(Fi)

a.For all possible input variable xj,train the model with the input variables Fi
U{xj} and calculate E(Fi U{xj} on the validation set

b.Choose the input variable x that causes the least error E(Fi U{xj}),

m=arg minj E(Fi U {xj})

c.Set Fi+1=Fi U{xm}


3.The set Fi is outputted as the best subset

2.Backward Selection:Start with all variables and remove them one by one at each step till
the error becomes minimum.

STEPS:

1.Set F=x1,x2,…,xn and E(F0)=Infinity

2.For i=0,1,…. Repeat until E(Fi+1)>=E(Fi)

a.For all possible input variable xj, train the model with the input variables Fi -
{xj} and calculate E(Fi -{xj} on the validation set

b.Choose the input variable x that causes the least error

E(Fi -{xj}),

m=arg minj E(Fi -{xj})

c.Set Fi+1=Fi -{xm}

3.The set Fi is outputted as the best subset

PRINCIPAL COMPONENT ANALYSIS(PCA):

 The Principal Component Analysis is a popular unsupervised learning technique for


reducing the dimensionality of data.
 It is a statistical process that converts the observations of correlated features into a set
of linearly uncorrelated features with the help of orthogonal transformation.
 These new transformed features are called the Principal components.
 It is one of the popular tools that is used for explanatory data analysis and predictive
modelling.
 It is a technique to draw strong patterns from the given dataset by reducing the
variances and find the lower dimensional surface to project the high dimensional data
Principal Component:

 The Principal Components are a straight line that captures most of the variance of the
data.
 They have a direction and magnitude.
 Principal components are orthogonal projections (perpendicular) of data onto lower-
dimensional space.

APPLICATIONS:

 Image processing and compression


 Movie recommendation system
 Optimizing the power allocation in various communication channels
 Computer vision
 Hiding patterns in high dimensions.

Working of PCA:
1. Normalize the data

Standardize the data before performing PCA. This will ensure that each feature has a mean =
0 and variance = 1.

2. Build the covariance matrix

Construct a square matrix to express the correlation between two or more features in a
multidimensional dataset.

3. Find the Eigenvectors and Eigenvalues


Calculate the eigenvectors/unit vectors and eigenvalues. Eigenvalues are scalars by which we
multiply the eigenvector of the covariance matrix.

4.Sort the eigenvectors in highest to lowest order and select the number of principal
components.

FACTOR ANALYSIS:

 Factor Analysis is a linear statistical model


 It is used to explain the variance among the observed variable and condense a set of
the observed variable into the unobserved variable called factors.
 Observed variables are modelled as a linear combination of factors and error terms
 Factor of latent variable is associated with multiple observed variables,who have
common patterns of responses.
 Each factor explains a particular amount of variance in the observed variables.
 It helps in data interpretations by reducing the number of variables.
 Example :Analysis of Factor in
Types of Factor Analysis:

4types:

 Exploratory Factor analysis(EDA)


 Confirmatory Factor Analysis(CFA)
 Multiple Factor Analysis(MFA)
 Generalized Procrustes Analysis(GPA)

How does Factor Analysis work?

 It is to reduce the number of observed variables and find unobservable variables.


 These variables help the market researcher to conclude the survey.
 This conversion of the observed variables to unobserved variables can be achieved in
two steps.
o Factor Extraction
o Factor Rotation

Factor Analysis vs PCA:

Factor Analysis PCA


 Covariance in data.  the maximum
amount of variance

 does not require  fully orthogonal to


factors to be each other
orthogonal

 linear combination of  linear combination of


the unobserved the observed
variables variables while in
FA

 Labelable and  uninterpretable


interpretable

 Latent variable  Dimensionality


method reduction

MULTIDIMENSIONAL SCALING(MDS:

 It is a multivariate data analysis approach that is used to visualize the similarity


between samples by plotting points in two dimensional plots
 MDS returns an optimal solution to represent the data in a lower dimensional
space,where the number of dimensions k is specified by the analyst
 An MDS algorithm takes as an input data in the dissimilarity matrix,representing
the distances between pairs of objects.
 The input data for MDS is a dissimilarity matrix representing the distances
between pairs of objects.
TYPES OF MDS ALGORITHMS:

2 types

1.Classical multidimensional scaling

It preserves the original distance matrix a points as possible and it is a metric.

2.Non metric multidimensional scaling

It is ordinal value and not a metric for distance matrix

STEPS OF MDS ALGORITHM:

 Assign a number of points to coordinates in n-dimensional space


 Calculate Euclidean distances for all pairs of points.

MDS and PCA:

MDS PCA
To reduce the similar objects To reduce the dimensions of complex
data
Focus on relations among scaled objects Focus to maximize explains variance
N dimensional data into 2dimensional Multidimensional space to the directions
space in similar objects in the of maximum variability using
ndimensional space will be close correlation matrix to analyze the
together on two dimensional plot correlation between data points and
variables

LINEAR DISCRIMINANT ANALYSIS(LDA)

 LDA is a dimensionality reduction technique used as a preprocessing step for


pattern classification and machine learning applications.
 LDA is similar to PCA but LDA in addition finds the axes that maximizes the
separation between multiple classes
 Whenever there is a requirement to separate two or more classes having multiple
features efficiently, the Linear Discriminant Analysis model is considered the
most common technique to solve such classification problems.
 For e.g., if we have two classes with multiple features and need to separate them
efficiently.
 When we classify them using a single feature, then it may show overlapping.

 To overcome the overlapping issue in the classification process, we must increase


the number of features regularly.
 How LDA works?
 linear Discriminant analysis is used as a dimensionality reduction technique in
machine learning, using which we can easily transform a 2-D and 3-D graph into a
1-dimensional plane.
 we have two classes in a 2-D plane having an X-Y axis, and we need to classify
them efficiently.
 As we have already seen in the above example that LDA enables us to draw a
straight line that can completely separate the two classes of the data points.
 Here, LDA uses an X-Y axis to create a new axis by separating them using a
straight line and projecting data onto a new axis.
 Hence, we can maximize the separation between these classes and reduce the 2-D
plane into 1-D.
 To create a new axis, Linear Discriminant Analysis uses the following criteria:
 It maximizes the distance between means of two classes.
 It minimizes the variance within the individual class.
 Using the above two conditions, LDA generates a new axis in such a way that it
can maximize the distance between the means of the two classes and minimizes
the variation within each class.

APPLICATIONS OF LDA:

 Face recognition
 Medical
 Customer Identification
 For Prediction and Learning

PCA LDA
Does not care about classes It cares classes or labels
It find to maximize the variance in The axes that maximize separation
given dataset between different classes of data.
It takes a small sample size It performs multi class classification
tasks
GAUSSIAN DISCRIMINANT ANALYSIS
 Gaussian Discriminant Analysis is a Generative Learning Algorithm that aims to
determine the distribution of every class.
 It attempts to create the Gaussian distribution to each category of data in a
separate way.
 The likelihood of an outcome in the case using an algorithm known as the
Generative learning algorithm is very high if it is close to the centre of the
contour, which corresponds to its class.
 It diminishes when we move away from the middle of the contour.

 To determine P(X|y), we can use Multivariate Gaussian Distribution to calculate a


probability density equation for every particular class.

 In order to determine P(y) or the class prior for each class, we can make use of the
Bernoulli distribution since all sample data used in binary classification could be 0
or 1.

 So the probability distribution, as well as a class prior to a sample, could be


determined using the general model of Gaussian and Bernoulli distributions:
 In accordance with the principle of Likelihood estimation, we select the
parameters so as to increase the probability function, as shown in Equation 4.

 Instead of maximizing the Likelihood Function, we can boost the Log-Likelihood


Function, a strict growing function.

Thus, Gaussian Discriminant Analysis works extremely well with a limited volume of
data and may be more robust than Logistic Regression if our fundamental assumptions
regarding data distribution are correct.

You might also like