0% found this document useful (0 votes)

145 views23 pages

Role of Linear Algebra On ML

Linear algebra is a crucial mathematical framework in machine learning, enabling the representation and manipulation of high-dimensional data through vectors, matrices, and tensors. It facilitates operations such as solving linear equations, optimizing models, and performing transformations essential for algorithms like Principal Component Analysis (PCA). Understanding linear algebra concepts is vital for implementing machine learning techniques, including linear regression, regularization, and deep learning.

Uploaded by

vijayabhargavi.sapuri2011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views23 pages

Role of Linear Algebra On ML

Uploaded by

vijayabhargavi.sapuri2011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Role of Linear Algebra on ML

What is Linear Algebra?

It is a branch of mathematics that allows to define and perform
operations on higher-dimensional coordinates and plane
interactions in a concise way. Linear Algebra is an algebra
extension to an undefined number of dimensions. Linear
Algebra concerns the focus on linear equation systems.
Properties of Linear Algebra:
 Associative Property: It is a property in Mathematics
which states that if a, b and c are mathematical objects
than a + (b + c) = (a + b) + c in which + is a binary
operation.
 Commutative Property: It is a property in Mathematics
which states that if a and b are mathematical objects
then a + b = b + a in which + is a binary operation.
 Distributive Property: It is a property in Mathematics
which states that if a, b and c are mathematical objects
then a * (b + c)= (a * b) + (a * c) in which * and + are
binary operators

Linear Algebra for Machine learning

In the context of Machine Learning, linear algebra is employed to
model and analyze relationships within data. It enables the
representation of data points as vectors and allows for efficient
computation of operations on these vectors. Linear
transformations, matrices, and vector spaces play a significant
role in defining and solving problems in ML.
The utilization of linear algebra in ML extends to solving systems
of linear equations, optimizing models, and comprehending
transformations inherent in algorithms like Principal Component
Analysis (PCA). The integration of linear algebra in ML provides a
powerful and versatile mathematical toolbox, to model, analyze,
and optimize complex relationships within data, thereby
advancing the capabilities of machine learning algorithms.

Importance of Linear Algebra in Machine Learning:

Linear algebra is fundamental to machine learning due to its role
in representing and solving systems of equations, defining
transformations, and optimizing algorithms. It provides a
mathematical framework for understanding and working with
high-dimensional data, making it a cornerstone for various
machine learning models and techniques.

Different ways to represent the Data in Linear Algebra

Linear algebra allows the representation of data using Scaler &
vectors, enabling efficient storage and manipulation of large
datasets.
Linear Algebra concepts used in Machine Learning for
Representation of Data:
Scalar and Vector
 Scalar:
It is a physical quantity described using a single element,
It has only magnitude and not direction. Basically, a
scalar is just a single number.
Example: 17 and 256
 Vector:
It is a geometric object having both magnitude and
direction, it is an ordered number array, and are always
in a row or column. A Vector has just one index, which
can refer to a particular value within the Vector.
𝑉=[𝑒1,𝑒2,𝑒3,𝑒4]V=[e1,e2,e3,e4]
Here V is a vector in which e1, e2, e3 and e4 are its
elements, and V[2] is e3.
Vector Operations
1. Scalar-Vector Multiplication
p = [e1, e2, e3]
The product of a scalar with a vector gives the below result.
When the scalar 2 is multiplied by a vector p then all the
elements of the vector p is multiplied by that scalar. This
operation satisfies commutative property.
p * 2 = [2 * e1, 2 * e2, 2 * e3]
Matrix
It is an ordered 2D array of numbers, symbols or expressions
arranged in rows and columns. It has two indices, the first index
points to the row, and the second index points to the column. A

𝑀=[𝑒1𝑒2𝑒3𝑒4]M=[e1e3e2e4]
Matrix can have multiple numbers of rows and columns.

and 𝑀[1][0]M[1][0]is e3.

Above M is a 2D matrix having e1, e2, e3 and e4 as elements,
A matrix having its left diagonal elements as 1 and other
elements 0 is an Identity matrix.
Example:
(1001)(1001) is 2D Identity Matrix. (100010001)100010001 is 3D Identity
Matrix.
Tensor
It is an algebraic object representing a linear mapping of
algebraic objects from one set to another. It is actually a 3D
array of numbers with a variable number of axes, arranged on a
regular grid. A tensor has three indices, first index points to the
row, the second index points to the column and the third
index points to the axis.

tensor T has 8 elements 𝑇=[𝑒1,𝑒2,𝑒3,𝑒4,𝑒5,𝑒6,𝑒7,𝑒8], three-

Here the

dimensional tensor with dimensions 2 x 2 x 2 such that 𝑇[0][3]

[1]=𝑒8.

Tensors play a significant role in machine learning, particularly in

deep learning, due to their ability to represent and manage
multi-dimensional data.

Linear Algebra Operations:

Machine learning models often involve transformations of data.
Linear algebra provides a concise way to represent and analyze
these transformations using matrices and linear operators.

Matrix Operations:
1. Scalar-Matrix Multiplication
When the scalar a is multiplied by a matrix p then all the
elements of the matrix p is multiplied by that scalar. Scalar-
Matrix multiplication is associative, distributive and
commutative.
p = [𝑒1,𝑒2,𝑒3,𝑒4]a is a scalar.𝑝∗𝑎=(𝑎∗𝑒1)(𝑎∗𝑒2)(𝑎∗𝑒3)(𝑎∗𝑒4)

2. Matrix-Matrix Addition
In-order to add matrices the rows and columns of the matrices
should be [Link] element of the first matrix is added with
the respective element of the another matrix both having same
row and column value. Matrix-Matrix addition is associative,
distributive and commutative. The addition of
matrix m1 and m2 gives the below result.
𝑚1+𝑚2=(𝑎+𝑝)(𝑏+𝑞)(𝑐+𝑟)(𝑑+𝑠)

3. Matrix-Matrix Subtraction
Each element of the first matrix is subtracted with the respective
element of the another matrix both having same row and column
value. Matrix-Matrix addition is associative, distributive and
commutative. In-order to subtract matrices, the rows and
columns of the matrices should be equal. The subtraction
between matrix m1 and m2 gives the below result-
𝑚1−𝑚2=(𝑎−𝑝)(𝑏−𝑞)(𝑐−𝑟)(𝑑−𝑠)

4. Matrix-Matrix Multiplication
 To multiply two matrices, the number of columns of the
first matrix should be equal to the number of rows in the
second [Link]-Matrix multiplication is associative
and distributive but not commutative. The product of
matrix m1 and m2 is given below
𝑚1∗𝑚2=((𝑎∗𝑝)+(𝑏∗𝑟))((𝑎∗𝑞)+(𝑏∗𝑠))((𝑐∗𝑝)+(𝑑∗𝑟))((𝑐∗𝑞)+(𝑑∗𝑠))

Vector-Matrix Operations (Vector-Matrix Multiplication):

The number of rows of a matrix should be equal to the number of

elements of the vector then only they can be multiplied. Vector-
Matrix multiplication is associative and distributive but not
commutative.
Multiplying a matrix p with a vector q gives the below product-
𝑝=[𝑒1,𝑒2,𝑒3,𝑒4,𝑒5,𝑒6 ] , 𝑞=𝑎𝑏
𝑝∗𝑞=[(𝑒1∗𝑎)+(𝑒2∗𝑏)][(𝑒3∗𝑎)+(𝑒4∗𝑏)][(𝑒5∗𝑎)+(𝑒6∗𝑏)]
Transpose
The transpose of a matrix generates a new matrix in which the
rows become columns and columns become rows of the original
matrix. Transposition is vital for tasks like computing correlations
and solving linear equations.
Transpose of an m*n matrix will give a n*m matrix.
𝑚=(𝑎 𝑏 𝑇𝑟𝑎𝑛𝑠𝑝𝑜𝑠𝑒(𝑚)=(𝑎 𝑐
𝑐 𝑑) 𝑏 𝑑)

Inverse

The inverse of a matrix is the matrix when multiplied with the

original matrix gives the Identity matrix as the product. If m is a
matrix and n is the inverse matrix of m, then m*n = I, in
which I represent Identity matrix.
 They capture the intrinsic properties of a transformation
or dataset.

Eigenvalues and Eigenvectors:

Understanding eigenvectors and eigenvalues provides insights
into the behavior of linear transformations and is foundational in
various fields, especially in the analysis of square matrices.
 In linear algebra, eigenvectors and eigenvalues are
crucial for diagonalizing matrices. Diagonalization
simplifies matrix operations, making computations more
efficient.
 They are used in various applications, such as principal
component analysis (PCA) in data analysis, solving
systems of linear differential equations.
 They capture the intrinsic properties of a transformation
or dataset.
Eigenvectors
Eigenvectors are special vectors that remain in the same
direction after a linear transformation. When a matrix A is
multiplied by its corresponding eigenvector (v), the result is a
scaled version of the original vector, i.e.,
Av=λv,
where,
 \( 𝜆 is the eigenvalue.
Eigenvectors are essentially the “directions” that remain
unchanged, only scaled, when a transformation is applied.
Eigenvalues
Eigenvalues λ are the scaling factors by which the eigenvectors
are stretched or compressed during a linear transformation. They
represent how much the eigenvector is “stretched” or “shrunk”
by the linear transformation. Larger eigenvalues indicate a
greater stretching, and smaller eigenvalues indicate
compression.
Matrix Factorization
Matrix decomposition techniques like SVD is one of the most
suggested areas of linear algebra.

Singular Value Decomposition (SVD) is a powerful technique

for decomposing a matrix into three constituent matrices: U, S,
and VT. These matrices can be used to represent the matrix in a
more compact and informative way.
SVD has a wide range of applications in machine learning,
including:
 Data compression, noise reduction by discarding the
smaller singular values and their corresponding singular
vectors to reduce the storage requirements for data
without significantly affecting its quality.
 Dimensionality reduction by keeping only the most
important singular values and their corresponding
singular vectors useful for tasks such as data
visualization and feature extraction.

Linear Algebra in Machine Learning:

Datasets in machine learning serve as the foundation for model
training and evaluation. These datasets are essentially matrices,
where each row represents a unique observation or data point,
and each column represents a specific feature or variable. The
tabular structure of datasets aligns with the principles of linear
algebra, where matrices are fundamental entities.
Linear algebra provides the tools to manipulate and transform
these datasets efficiently. Operations like matrix multiplication,
addition, and decomposition are crucial for tasks such as feature
engineering, data preprocessing, and computing various
statistical measures. The representation of datasets as matrices
allows for seamless integration of linear algebra techniques into
the machine learning workflow.
 One-hot Encoding
In machine learning, dealing with categorical variables
often involves converting them into a numerical format,
and one-hot encoding is a prevalent technique for this
purpose. It transforms categorical variables into binary
vectors, where each category is represented by a
column, and the presence or absence of that category is
indicated by binary values.
The resulting one-hot encoded representation can be
viewed as a sparse matrix, where most elements are
zero, and linear algebra’s vector representation becomes
evident. This compact encoding simplifies the handling of
categorical data in machine learning algorithms,
facilitating efficient computations and reducing the risk
of bias associated with numerical encodings.
 Linear Regression
Linear regression is a fundamental machine learning
algorithm, and its implementation underscores the
importance of linear algebra in the field. Linear algebra
provides the mathematical foundation for understanding
and solving the equations involved in linear regression.
The use of matrices and vectors simplifies the
formulation and computation, making the
implementation more efficient and scalable.
Understanding linear algebra is essential for grasping the
underlying principles of linear regression and other
machine learning algorithms.
 Regularization
Regularization methods act as a form of constraint on the
model’s complexity, encouraging simpler models with
smaller coefficients. The elegant integration of linear
algebra concepts into regularization
techniques highlights the synergy between mathematical
principles and practical machine learning challenges.
The regularization term in both L1 and L2 regularization
is essentially a measure of the magnitude or length of
the coefficient vector, a concept directly borrowed from
linear algebra. In the case of L2 regularization, the
penalty term is proportional to the Euclidean norm (L2
norm) of the coefficient vector, emphasizing the role of
linear algebra’s vector norms in regularization.
 Principal Component Analysis (PCA)
Principal Component Analysis (PCA) stands out as a
powerful dimensionality reduction technique widely used
in machine learning and data analysis. Its primary
objective is to transform a high-dimensional dataset into
a lower-dimensional representation while retaining as
much variability as possible.
At its core, PCA involves the computation of eigenvectors
and eigenvalues of the dataset’s covariance matrix—a
task that aligns with linear algebra principles. The
covariance matrix captures the relationships between
different features, and its eigenvectors represent the
principal components, or the directions of maximum
variance.
 Images and Photographs
Images and photographs, vital components of computer
vision applications, are inherently structured as matrices
of pixel values. Each pixel’s position corresponds to a
specific element in the matrix, and its intensity is
encoded as the value of that element. Linear algebra
operations play a central role in image processing tasks,
such as scaling, rotating, and filtering.
Transformations applied to images can be represented as
matrix operations, making linear algebra an essential
tool in image manipulation. For instance, a rotation
transformation can be expressed as a matrix
multiplication, showcasing the versatility of linear
algebra in handling image data.
 Deep Learning
Deep learning, characterized by artificial neural
networks (ANNs) with multiple layers, relies extensively
on linear algebra structures for both model
representation and training. ANNs process information
through interconnected nodes organized in layers, where
each connection is associated with a weight.
The fundamental operations within a neural network—
matrix multiplications and element-wise activations—are
inherently linear algebraic. The input layer, hidden
layers, and output layer collectively involve manipulating
vectors, matrices, and tensors.
Conclusion:
Linear algebra is the cornerstone of mathematical concepts in
machine learning. A solid grasp of vectors, matrices, and
operations like matrix multiplication is essential for
understanding algorithms, developing models, and navigating
the intricacies of data transformations. Aspiring machine
learning practitioners benefit immensely from a strong
foundation in linear algebra, enhancing their ability to innovate
and contribute to this dynamic field.
Introduction

Entropy is one of the key aspects of Machine Learning. It is a must to know for

anyone who wants to make a mark in Machine Learning and yet it perplexes

many of us. The focus of this article is to understand the working of entropy in

machine learning by exploring the underlying concept of probability theory, how

the formula works, its significance, and why it is important for the Decision Tree

algorithm.

This article was published as a part of the Data Science Blogathon .

Table of contents
What is Entropy in Machine Learning?

In Machine Learning, entropy measures the level of disorder or uncertainty in

a given dataset or system. It is a metric that quantifies the amount of

information in a dataset, and it is commonly used to evaluate the quality of a

model and its ability to make accurate predictions.

A higher entropy value indicates a more heterogeneous dataset with diverse

classes, while a lower entropy signifies a more pure and homogeneous subset of
data. Decision tree models can use entropy to determine the best splits to make

informed decisions and build accurate predictive models.

The Origin of Entropy

Claude E. Shannon’s 1948 paper on “A Mathematical Theory of Communication ”

marked the birth of information theory. He aimed to mathematically measure the

statistical nature of lost information in phone-line signals and proposed

information entropy to estimate uncertainty reduced by a message. Entropy

measures the amount of surprise and data present in a variable. In information

theory, a random variable’s entropy reflects the average uncertainty level in its

possible outcomes. Events with higher uncertainty have higher entropy.

Information theory finds applications in machine learning models, including

Decision Trees. Understanding entropy helps improve data storage,

communication, and decision-making.

What is a Decision Tree in Machine Learning?

The Decision Tree is a popular supervised learning technique in machine

learning, serving as a hierarchical if-else statement based on feature comparison

operators. It is used for regression and classification problems, finding

relationships between predictor and response variables. The tree structure

includes Root, Branch, and Leaf nodes, representing all possible outcomes

based on specific conditions or rules. The algorithm aims to create homogenous

Leaf nodes containing records of a single type in the outcome variable. However,

sometimes restrictions may lead to mixed outcomes in the Leaf nodes. To build

the tree, the algorithm selects features and thresholds by optimizing a loss

function, aiming for the most accurate predictions. Decision Trees offer
interpretable models and are widely used for various applications, from simple

binary classification to complex decision-making tasks.

Components of a Decision Tree

 Root Node: This is where the tree starts. It represents the entire dataset and is

divided into branches based on a chosen feature.

 Internal Nodes: These nodes represent the questions or conditions asked about

the features. They lead to further branches or child nodes.

 Branches and Edges: These show the possible outcomes of a condition. They

lead to child nodes or leaves.

 Leaves (Terminal Nodes): These are the end points of the tree. They represent

the final decision or prediction.

Cost Function in a Decision Tree

The decision tree algorithm learns that it creates the tree from the dataset via the

optimization of the cost function. In the case of classification problems, the cost

or the loss function is a measure of impurity in the target column of nodes

belonging to a root node.

The impurity is nothing but the surprise or the uncertainty available in the

information that we had discussed above. At a given node, the impurity is a

measure of a mixture of different classes or in our case a mix of different car

types in the Y variable. Hence, the impurity is also referred to as heterogeneity

present in the information or at every node.

The goal is to minimize this impurity as much as possible at the leaf (or the end-

outcome) nodes. It means the objective function is to decrease the impurity (i.e.

uncertainty or surprise) of the target column or in other words, to increase the

homogeneity of the Y variable at every split of the given data.

To understand the objective function, we need to understand how the impurity or

the heterogeneity of the target column is computed. There are two metrics to

estimate this impurity: Entropy and Gini. In addition to this, to answer the

previous question on how the decision tree chooses the attributes, there are

various splitting methods including Chi-square, Gini-index , and Entropy however,

the focus here is on Entropy and we will further explore how it helps to create the

tree.
Example of Cost Function in a Decision Tree

Now, it’s been a while since I have been talking about a lot of theory stuff. Let’s

do one thing: I offer you coffee and we perform an experiment. I have a box full

of an equal number of coffee pouches of two flavors: Caramel Latte and the

regular, Cappuccino. You may choose either of the flavors but with eyes closed.

The fun part is: in case you get the caramel latte pouch then you are free to stop

reading this article 🙂 or if you get the cappuccino pouch then you would have to

read the article till the end 🙂

This predicament where you would have to decide and this decision of yours that

can lead to results with equal probability is nothing else but said to be the state

of maximum uncertainty. In case, I had only caramel latte coffee pouches or

cappuccino pouches then we know what the outcome would have been and

hence the uncertainty (or surprise) will be zero.

The probability of getting each outcome of a caramel latte pouch or

cappuccino pouch is:

 P(Coffee pouch == Caramel Latte) = 0.50

 P(Coffee pouch == Cappuccino) = 1 – 0.50 = 0.50

When we have only one result either caramel latte or cappuccino pouch, then in

the absence of uncertainty, the probability of the event is:

 P(Coffee pouch == Caramel Latte) = 1

 P(Coffee pouch == Cappuccino) = 1 – 1 = 0

There is a relationship between heterogeneity and uncertainty; the more

heterogeneous the event the more uncertainty. On the other hand, the less

heterogeneous, or so to say, the more homogeneous the event, the lesser is the

uncertainty. The uncertainty is expressed as Gini or Entropy.

How Does Entropy Actually Work?

Claude E. Shannon had expressed this relationship between the probability and

the heterogeneity or impurity in the mathematical form with the help of the

following equation:

H(X) = – Σ (p i * log2 pi)

The uncertainty or the impurity is represented as the log to base 2 of the

probability of a category (p i). The index (i) refers to the number of possible

categories. Here, i = 2 as our problem is a binary classification.

This equation is graphically depicted by a symmetric curve as shown below. On

the x-axis is the probability of the event and the y-axis indicates the

heterogeneity or the impurity denoted by H(X).

Example of Entropy in Machine Learning

We will explore how the curve works in detail and then shall illustrate the

calculation of entropy for our coffee flavor experiment.

Source: Slideplayer

The log2 pi has a very unique property that is when there are only two outcomes

say probability of the event = p i is either 1 or 0.50 then in such scenario

log2 pi takes the following values (ignoring the negative term):

pi = 1 pi = 0.50

log2 (1) =
log2 pi log2 (0.50) = 1
0

Now, the above values of the probability and log 2 pi are depicted in the following

manner:
The catch is when the probability, p i becomes 0, then the value of log 2 p0 moves

towards infinity and the curve changes its shape to:

The entropy or the impurity measure can only take value from 0 to 1 as the

probability ranges from 0 to 1 and hence, we do not want the above situation. So,

to make the curve and the value of log2 pi back to zero, we multiply log 2 pi with

the probability i.e. with p i itself.

Therefore, the expression becomes (p i* log2 pi) and log2 pi returns a negative

value and to remove this negativity effect, we multiply the resultant with a

negative sign and the equation finally becomes:

H(X) = – Σ (p i * log2 pi)

Now, this expression can be used to show how the uncertainty changes

depending on the likelihood of an event.

The curve finally becomes and holds the following values:

This scale of entropy from 0 to 1 is for binary classification problems. For a

multiple classification problem, the above relationship holds, however, the scale

may change.

Calculation of Entropy in Python

We shall estimate the entropy for three different scenarios. The event Y is getting

a caramel latte coffee pouch. The heterogeneity or the impurity formula for two

different classes is as follows:

H(X) = – [(pi * log2 pi) + (qi * log2 qi)]

where,

 pi = Probability of Y = 1 i.e. probability of success of the event

 qi = Probability of Y = 0 i.e. probability of failure of the event

Case 1
Quantity of
Coffee flavor Probability
Pouches

Caramel Latte 7 0.7

Cappuccino 3 0.3
Quantity of
Coffee flavor Probability
Pouches

Total 10 1

H(X) = – [(0.70 * log 2 (0.70)) + (0.30 * log 2 (0.30))] = 0.88129089

This value 0.88129089 is the measurement of uncertainty when given the box full

of coffee pouches and asked to pull out one of the pouches when there are

seven pouches of caramel latte flavor and three pouches of cappuccino flavor.

Case 2
Quantity of
Coffee flavor Probability
Pouches

Caramel Latte 5 0.5

Cappuccino 5 0.5

Total 10 1

H(X) = – [(0.50 * log 2 (0.50)) + (0.50 * log 2 (0.50))] = 1

Case 3
Quantity of
Coffee flavor Probability
Pouches

Caramel Latte 10 1

Cappuccino 0 0

Total 10 1

H(X) = – [(1.0 * log 2 (1.0) + (0 * log 2 (0)] ~= 0

In scenarios 2 and 3, can see that the entropy is 1 and 0, respectively. In

scenario 3, when we have only one flavor of the coffee pouch, caramel latte, and

have removed all the pouches of cappuccino flavor, then the uncertainty or the
surprise is also completely removed and the aforementioned entropy is zero. We

can then conclude that the information is 100% present.

Use of Entropy in Decision Tree

As we have seen above, in decision trees the cost function is to minimize the

heterogeneity in the leaf nodes. Therefore, the aim is to find out the attributes

and within those attributes the threshold such that when the data is split into two,

we achieve the maximum possible homogeneity or in other words, results in the

maximum drop in the entropy within the two tree levels.

At the root level, the entropy of the target column is estimated via the formula

proposed by Shannon for entropy. At every branch, the entropy computed for the

target column is the weighted entropy. The weighted entropy means taking the

weights of each attribute. The weights are the probability of each of the classes.

The more the decrease in the entropy, the more is the information gained.
Information Gain is the pattern observed in the data and is the reduction in

entropy. It can also be seen as the entropy of the parent node minus the entropy

of the child node. It is calculated as 1 – entropy. The entropy and information

gain for the above three scenarios is as follows:

Entropy Information Gain

Case
0.88129089 0.11870911
1

Case
1 0
2

Case
0 1
3

Estimation of Entropy and Information Gain at Node Level

We have the following tree with a total of four values at the root node that is split

into the first level having one value in one branch (say, Branch 1) and three

values in the other branch (Branch 2). The entropy at the root node is 1.

Now, to compute the entropy at the child node 1, the weights are taken as ⅓ for

Branch 1 and ⅔ for Branch 2 and are calculated using Shannon’s entropy
formula. As we had seen above, the entropy for child node 2 is zero because

there is only one value in that child node meaning there is no uncertainty and

hence, the heterogeneity is not present.

H(X) = – [(1/3 * log 2 (1/3)) + (2/3 * log 2 (2/3))] = 0.9184

The information gain for the above tree is the reduction in the weighted average

of the entropy.

Information Gain = 1 – ( ¾ * 0.9184) – (¼ *0) = 0.3112

Conclusion

Information Entropy or Shannon’s entropy quantifies the amount of uncertainty

(or surprise) involved in the value of a random variable or the outcome of a

random process. Its significance in the decision tree is that it allows us to

estimate the impurity or heterogeneity of the target variable. Subsequently, to

achieve the maximum level of homogeneity in the response variable, the child

nodes are created in such a way that the total entropy of these child nodes must

be less than the entropy of the parent node.

Entropy plays a fundamental role in machine learning, enabling us to measure

uncertainty and information content in data. Understanding entropy is crucial for

building accurate decision trees and improving various learning models.

AL3451 - Unit 1
No ratings yet
AL3451 - Unit 1
12 pages
Module 1 Lecture 3 - Linear Algibra
No ratings yet
Module 1 Lecture 3 - Linear Algibra
34 pages
Basic Linear Algebra For Deep Learning - Built in
No ratings yet
Basic Linear Algebra For Deep Learning - Built in
18 pages
Leniear Algebra Operation For Machine Learning
No ratings yet
Leniear Algebra Operation For Machine Learning
10 pages
Linear Algebra111
No ratings yet
Linear Algebra111
9 pages
Unit 1 Deep Learning
No ratings yet
Unit 1 Deep Learning
42 pages
Linear Algebra
No ratings yet
Linear Algebra
59 pages
Linear Algebra For Machine Learning
No ratings yet
Linear Algebra For Machine Learning
115 pages
01 Section 2.1 QR Code Content
No ratings yet
01 Section 2.1 QR Code Content
23 pages
Linear Algebra and Optimization
No ratings yet
Linear Algebra and Optimization
113 pages
Linear Algebra Essentials for Machine Learning
No ratings yet
Linear Algebra Essentials for Machine Learning
62 pages
Linear Algebra
No ratings yet
Linear Algebra
12 pages
Introduction To Linear Algebra
No ratings yet
Introduction To Linear Algebra
33 pages
Not Know
No ratings yet
Not Know
3 pages
Linear Algebra
No ratings yet
Linear Algebra
38 pages
Unit3 Notes
No ratings yet
Unit3 Notes
34 pages
Linear Algebra: Matrices and Operations
No ratings yet
Linear Algebra: Matrices and Operations
4 pages
Tungban Machine Learning Math Course
No ratings yet
Tungban Machine Learning Math Course
124 pages
LinearAlgebraPrimer Ver 2010
No ratings yet
LinearAlgebraPrimer Ver 2010
15 pages
Module 2 ML Mumbai University
No ratings yet
Module 2 ML Mumbai University
39 pages
ACFN 221, Ch. II
No ratings yet
ACFN 221, Ch. II
26 pages
Data Science Unit - 3 - 31.8.23
No ratings yet
Data Science Unit - 3 - 31.8.23
62 pages
Mathematics for AI: Matrix Basics
No ratings yet
Mathematics for AI: Matrix Basics
60 pages
AI in Math
100% (1)
AI in Math
32 pages
Asmita Sinha
No ratings yet
Asmita Sinha
15 pages
Chapter 2. Matrix Algebra and Its Applications
No ratings yet
Chapter 2. Matrix Algebra and Its Applications
27 pages
Linear Algebra
No ratings yet
Linear Algebra
6 pages
Linear Models and Matrix Algebra PDF - 024349
No ratings yet
Linear Models and Matrix Algebra PDF - 024349
22 pages
Linear Algebra
No ratings yet
Linear Algebra
4 pages
Engineering Maths K-Wiki Chapter 1 Linear Algebra
No ratings yet
Engineering Maths K-Wiki Chapter 1 Linear Algebra
52 pages
Introduction To Linear Algebra
No ratings yet
Introduction To Linear Algebra
9 pages
Defining Properties of Orthogonal Matrices
No ratings yet
Defining Properties of Orthogonal Matrices
74 pages
UNIT I - Basics of Deep Learning
No ratings yet
UNIT I - Basics of Deep Learning
37 pages
Maths CH 2@2016
No ratings yet
Maths CH 2@2016
15 pages
Linear Algebra Basics for Students
No ratings yet
Linear Algebra Basics for Students
3 pages
Module 5
No ratings yet
Module 5
76 pages
Linear Algebra Chapter One PDF
No ratings yet
Linear Algebra Chapter One PDF
89 pages
Matrices Project Report
No ratings yet
Matrices Project Report
6 pages
Lec 3
No ratings yet
Lec 3
54 pages
Linear Algebra - Intuition, Math, Code
No ratings yet
Linear Algebra - Intuition, Math, Code
565 pages
Matrix Algebra Basics for ECON 331
No ratings yet
Matrix Algebra Basics for ECON 331
15 pages
CH Ii
No ratings yet
CH Ii
28 pages
Linear Algebra for STEM Students
No ratings yet
Linear Algebra for STEM Students
7 pages
Matrix Algebra - Nishtha - 25-9-13
No ratings yet
Matrix Algebra - Nishtha - 25-9-13
32 pages
Lecture 2 - Applied Mathematics (Updated 22nd Sept 2022)
No ratings yet
Lecture 2 - Applied Mathematics (Updated 22nd Sept 2022)
74 pages
Comprehensive Linear Algebra Notes Clean
No ratings yet
Comprehensive Linear Algebra Notes Clean
8 pages
Matrix Algebra Basics Explained
No ratings yet
Matrix Algebra Basics Explained
15 pages
Linear Algebra Notes Math II
No ratings yet
Linear Algebra Notes Math II
5 pages
Linear Algebra Tutorial for ML & DL
No ratings yet
Linear Algebra Tutorial for ML & DL
33 pages
Linear Algebra for ML Beginners
No ratings yet
Linear Algebra for ML Beginners
27 pages
Mathematics I Lecture Notes 1
No ratings yet
Mathematics I Lecture Notes 1
11 pages
Machine Learning Unit1
No ratings yet
Machine Learning Unit1
38 pages
Matrix Algebra and Its Applications
No ratings yet
Matrix Algebra and Its Applications
27 pages
Maths Chapter 2
No ratings yet
Maths Chapter 2
24 pages
Matrix Algebra Basics
No ratings yet
Matrix Algebra Basics
7 pages
Linear Algebra Review: CSC2515 - Machine Learning - Fall 2002
No ratings yet
Linear Algebra Review: CSC2515 - Machine Learning - Fall 2002
7 pages
Chap 2
No ratings yet
Chap 2
73 pages
JanuaryFebruary 2023
No ratings yet
JanuaryFebruary 2023
2 pages
October 2020
No ratings yet
October 2020
1 page
R2016 Curriculum and Syllabus M.E. Computer Science and Engineering
No ratings yet
R2016 Curriculum and Syllabus M.E. Computer Science and Engineering
63 pages
TQM Mid 2
No ratings yet
TQM Mid 2
1 page
OT - S2023 (3154201) (GTURanker - Com)
No ratings yet
OT - S2023 (3154201) (GTURanker - Com)
2 pages
Advance Optimization
No ratings yet
Advance Optimization
74 pages
Research Process
No ratings yet
Research Process
70 pages
AC Lab Manual R22
100% (1)
AC Lab Manual R22
92 pages
ML Interview Prep for Startups
100% (1)
ML Interview Prep for Startups
33 pages
Predictive Analytics With Knime Analytics For Citizen Data Scientists 1st Edition Acito Download
100% (2)
Predictive Analytics With Knime Analytics For Citizen Data Scientists 1st Edition Acito Download
78 pages
Autoencoder Asset Pricing Models
No ratings yet
Autoencoder Asset Pricing Models
35 pages
Complete Introduction To Algorithms For Data Mining and Machine Learning 1st Edition - Ebook PDF PDF For All Chapters
100% (3)
Complete Introduction To Algorithms For Data Mining and Machine Learning 1st Edition - Ebook PDF PDF For All Chapters
62 pages
Introduction to Locally Linear Embedding
No ratings yet
Introduction to Locally Linear Embedding
13 pages
Analysis of Climate and Weather Data: Christoph Frei
No ratings yet
Analysis of Climate and Weather Data: Christoph Frei
23 pages
Stock Market Prediction with ML Techniques
No ratings yet
Stock Market Prediction with ML Techniques
7 pages
Theory Questions in ML
No ratings yet
Theory Questions in ML
7 pages
A Review of Basketball Shooting Analysis Based On Artificial Intelligence
No ratings yet
A Review of Basketball Shooting Analysis Based On Artificial Intelligence
22 pages
Understanding Neural Networks Math
100% (1)
Understanding Neural Networks Math
17 pages
PCA Using R
No ratings yet
PCA Using R
12 pages
PCA & Clustering with R: Homework Guide
No ratings yet
PCA & Clustering with R: Homework Guide
1 page
Ge Presentation
100% (1)
Ge Presentation
35 pages
OpenCV Face Detection Guide
No ratings yet
OpenCV Face Detection Guide
13 pages
Advanced Speaker Recognition Methods
No ratings yet
Advanced Speaker Recognition Methods
14 pages
DS ML - BROCHURE - Updated
No ratings yet
DS ML - BROCHURE - Updated
30 pages
Transmission Line Faults in Power System and The Different Algorithms For Identification, Classification and Localization: A Brief Review of Methods
No ratings yet
Transmission Line Faults in Power System and The Different Algorithms For Identification, Classification and Localization: A Brief Review of Methods
24 pages
FTIR Spectroscopy for Edible Oil Adulteration
No ratings yet
FTIR Spectroscopy for Edible Oil Adulteration
8 pages
Abbas 2018
No ratings yet
Abbas 2018
15 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
14 pages
Multivariate Data Integration Using R
No ratings yet
Multivariate Data Integration Using R
331 pages
1 s2.0 S0731708519314840 Main
No ratings yet
1 s2.0 S0731708519314840 Main
52 pages
Chapter 7
No ratings yet
Chapter 7
23 pages
Impact of Financial Inclusion On The Women SME Entrepreneurs in Bangladesh
No ratings yet
Impact of Financial Inclusion On The Women SME Entrepreneurs in Bangladesh
17 pages
Sara Wahid Raafat Mohamed Bassiouny
No ratings yet
Sara Wahid Raafat Mohamed Bassiouny
177 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
42 pages
Machine Learning Types & Techniques
No ratings yet
Machine Learning Types & Techniques
17 pages
Resolving Spectral Mixtures With Applications From Ultrafast Time-Resolved Spectroscopy To Super-Resolution Imaging 1st Edition Cyril Ruckebusch (Eds.) - Ebook PDF Download
100% (3)
Resolving Spectral Mixtures With Applications From Ultrafast Time-Resolved Spectroscopy To Super-Resolution Imaging 1st Edition Cyril Ruckebusch (Eds.) - Ebook PDF Download
64 pages
Solutions 1
No ratings yet
Solutions 1
17 pages
FACE RECOGNITION (RESEARCH PAPER) 1 Wala Pa Nahuman
No ratings yet
FACE RECOGNITION (RESEARCH PAPER) 1 Wala Pa Nahuman
34 pages