MachineLearning Algorithm - Hope

#learnaiwithramisha
Do you think Machine Learning is only for

Computer Science Department?
l EEE Wire
l
M echanica Netw ess
ork
Ph
CE ysi
E cs
Ch
vil
em
Ci
ist
ry
tics
His
ma
tor
the
y.
Ma
www.hopelearning.net @hope_artificial_intelligence
#learnaiwithramisha
Machine Learning
Past data Algorithm Output
Heart SVM
Disease KNN Final
Forest Fire … Model
…
@hope_artificial_intelligence
www.hopelearning.net Hope_Artificial_Intelligence:HAI 2
#learnaiwithramisha
Types of Problem Statement in Machine Learning
#learnaiwithramisha
Supervised
Learning
Past Data/input Data
Input/Variables/Feature
Output/label
#learnaiwithramisha
Take way for Supervised Learning
Requirement should be clear.
Input and Output are well defined.
SCREENING-1
#learnaiwithramisha
#learnaiwithramisha
Unsupervised
Learning
www.hopelearning.net Machine Deep Learning(TM)
Hope_Artificial_Intelligence:HAI 7
#learnaiwithramisha
Unsupervised
Learning
#learnaiwithramisha
Take way for Unsupervised Learning
We don’t know what we need.
Only Input data, So we can do

Clustering.
#learnaiwithramisha
#learnaiwithramisha
Semi Supervised
Output/label
#learnaiwithramisha
Take way for Semi Supervised Learning
We know requirements
But Half of the output are not

labelled.
www.hopelearning.net Machine Deep Learning(TM)
#learnaiwithramisha
SCREENING - 2
Machine Learning
Problem Identification
#learnaiwithramisha
Problem Identification on Supervised Learning
Classification
Classifying the output based on the input parameter.
Yes/No
Dog/ Cat
Categorical Value House/not house
#learnaiwithramisha
Problem Identification on Supervised Learning
Regression
Numerical Value
www.hopelearning.net
#learnaiwithramisha
See the Picture Tell a Story
#learnaiwithramisha
Linear Graph
#learnaiwithramisha
Multiple Linear
#learnaiwithramisha
Artificial Intelligence
Machine Learning
Deep Learning
Supervised Learning Unsupervised Learning Semi Supervised Learning
Regression Classification Clustering

 Simple Linear Regression  Logistic Classification  K-Means Clustering
 Multiple Linear Regression  K-Nearest Neighbor(K-NN)  Hierarchical Clustering
 Logistic Regression  Support Vector Machine (SVM)
 Polynomial Regression  Kernel SVM
 Support Vector Machine Regression  Naive Bayes
 Decision Tree Regression  Decision Tree Classification
 Random Forest Regression  Random Forest Classification
#learnaiwithramisha
Algorithms
Algorithm for both Regression Algorithm Classification Algorithm

Regression and Classification
Support Vector Machine Linear Algorithm

Logistic Algorithm
Multiple Linear Navies Bayes

Decision Tree Regression KNN
Random Forest Polynomial Regression
#learnaiwithramisha
Polynomial Graph
Image Source: http://www.egwald.ca/linearalgebra/polynomials.php @hope_artificial_intelligence
#learnaiwithramisha
 Take Away Concepts In Simple Linear Regression
 Why it is Simple Linear Regression?
 How this regression helps for future prediction?
 Validating Parameter:
 Sum of Square Error(SSE) or Residual Sum of Square(RSS)
 Sum of Square Regression(SSR) or Explained Sum of Square(ESS)
 Sum of Square Total(SST)
 R Squared(R2)
 Adjusted R Squared
 When to use simple linear regression?
#learnaiwithramisha
 Why it is Simple Linear Regression?
It has one input(X/Independent) and one output (y/Dependent Variable/Response variable).
It uses the straight line equation y= mX+c
Where,
y= output(which forms straight line by adding all data points)
m= slope= dy/dx= weight (which says about constant distance between two data points)
X= Input( if input changes, then output will also change accordingly)
c= bias= intercept= starting of the straight line= initial value= minimum value.
#learnaiwithramisha How this regression helps for future prediction?
SIMPLE LINEAR REGRESSION
Dataset
Predicted Value X Y
10
Y=0.3X+0.5 1 2
Y Dependant Variable 8 2 4
6 3 6
4 8
4
w 5 10
2 Initial Value(b)/Minimum value/ Origin
0 3 5
1 2 4
Y =wX +b X Independent Variable
y)( n (∑ 𝑥𝑦 ¿−(∑ x ) ( ∑ 𝑦)
W= Slope= ___________________ B= Bias=Initial Value=Minimum Value= ___________________
( (
#learnaiwithramisha
Types of scattered data with Linear Regression line
D.V D.V
Actual Value I.V

I.V
Predicted Value
D.V
D.V
I.V
I.V
www.hopelearning.net Hope_Artificial_Intelligence:HAI
#learnaiwithramisha Validating parameter: 1 .Sum of Square Error(SSE) or Residual Sum of Square(RSS)
Formula:
yi
10 Error
−
−Y Error= Actual Value(yi) – Predicted Value(yi)
Y Dependant Variable
8
Where, i = Observation point
6
n= number of Observation point
_
4 = Actual Value
2 =Predicted Value
0
1 2 3 4 5
X Independent Variable
#learnaiwithramisha
Error
Input Actual Output Predicted Output Error=(Actual -Predicted)2
1 3.8 3.5 0.09
3 4.5 4.7 0.02
4 5.6 5.3 0.09
5 4.6 1.4 10.24
6 2.3 3.4 1.44
9 7.6 7.1 0.25
10 3.4 2.3 0.87
#learnaiwithramisha Validating parameter: 1.Sum of Square Error(SSE) or Residual Sum of Square(RSS)
Formula:
𝑛
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 (𝑆𝑆𝐸)=∑ ( 𝑦𝑖−−
𝑦𝑖 ) 2
r 𝑖=0
r ro Where, i = Observation point
reE yi
10 u a
f Sq −Y n= number of Observation point
o
m _
Su
8 = Actual Value
6 =Predicted Value
Take away:
2 If,
Higher the SSE, then predicted value is poor
0 Smaller the SSE, then predicted value is good
1 2 3 4 5
#learnaiwithramisha Validating parameter: 2.Sum of Square Regression(SSR) or Explained Sum of Square(ESS)
Formula:
𝑛
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛(𝑆𝑆𝑅)=∑−
( 𝑦 𝑖− 𝑦𝑚𝑒𝑎𝑛 ) 2
𝑖=0
10 Where, i = Observation point

−Y
8 n= number of Observation point

SSR _
6 =Predicted Value
ymean
= Mean of Dependant
4 Variable(Response variable)
2
Take away:
 Higher the SSR(or)ESS, better the model performance
0
1 2 3 4 5
#learnaiwithramisha Validating parameter: 3.Sum of Square Total(SST)
Formula: 𝑛
𝑆𝑢𝑚 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑜𝑡𝑎𝑙(𝑆𝑆𝑇 )=∑ ( 𝑦 𝑖 − 𝑦𝑚𝑒𝑎𝑛 ) 2
𝑖=0
SST= SSR+SSE
Where, i = Observation point
Y
10
n= number of Observation point
8 SST = Actual Value
6 = Mean of Dependant
ymean
Variable(Response variable)
4
2
Take away:
0
1 2 3 4 5 If,
Smaller the SST, better the model
#learnaiwithramisha Validating parameter: 4. R Squared(R2)
𝑛
_
SSR
∑_________________
( 𝑦 𝑖− 𝑦𝑚𝑒𝑎𝑛 ) 2 Purpose of R2 :
To know , how well the model is fitted.
______ = 𝑖= 0
R 2 =
SST 𝑛
∑ ( 𝑦 𝑖− 𝑦𝑚𝑒𝑎𝑛 ) 2 How R2 differs from other parameters like SSE,SSR and SST ?
𝑖= 0
SSE, SSR and SST range varies with dataset to dataset.
But,
Where, i = Observation point R2 exists between 0 and 1
If,
n= number of Observation point R2 = nearly to 1, then built model has better performance.
_ R2 = nearly to 0, then built model has poor performance.
= Actual Value
The only drawback of R2 is that if new predictors (X) are added
=Predicted Value to our model, R2 only increases or remains constant but it never
decreases. We can not judge that by increasing complexity of
= Mean of Dependant our model, are we making it more accurate
Variable(Response variable)
#learnaiwithramisha
Graphs with Equation
#learnaiwithramisha
Validating parameter: 5.Adjusted R2
n 1
R2  1 (1  R2 );
nk
n  number of observations,
k  number of independent variables.
R2 Adjusted R2
 R2 shows, how the model is fitted with  Adjusted R2 helps to find most
the actual data points. significant independent variable.
 R2 gets increase when new independent  Adjusted R2 only gets increase when
variable is added to the existing model most significant independent variable
irrespective of poor significant or most is added to the model otherwise
significant independent variable. stays constant.
 It has only positive value  It may have negative value.

between o to 1.
Simple Linear Regression- Take Away points
#learnaiwithramisha Model = 0.3X + 5 (for understanding purpose 0.3 = weight or slope, 5= Intercept or Initial value )
Concept Name Formula Inference

Simple Linear Y=wX+b One Dependent variable and One Independent
Variable
y)( Spread of data should be in linear
W= ___________________
(
B= (∑ 𝑥𝑦 ¿−(∑ x ) ( ∑ 𝑦)
n ___________________
(
Sum of Square Error(SSE) SSE= SSE maximum, poor the model
(or) (or) SSE minimum, better the model
Residual Sum of Square (RSS) SSE= summation(Actual value – Predicted Value) 2
(or) (Note: SSE value varies dataset to dataset)
Unexplained Sum of Square
Sum of Square Regression(SSR) SSR= SSR maximum, better the model
(or) (or) SSR minimum, poor the model
Explained Sum of Square(ESS) SSR=summation(Predicted Value – Dependent Value
mean)2 (Note: SSR value varies dataset to dataset)
Sum of Square Total(SST) SST= SST maximum, poor the Model

(or) SST minimum, better the model
SST = summation(Actual value -Predicted Value) 2
(Note: SST value varies dataset to dataset)
R-Squared (R2) R2 = SSR/SST If, R2 = nearly to 1, then built model has

better performance.
R2 = nearly to 0, then built model has poor
performance.
(Note: R2 exists between 0 and 1)
#learnaiwithramisha
Assumption for Linear Regression
BEFORE THE MODEL
The Quantitative Data Condition Should be numbers
The Straight Enough Condition (or “linearity”)  should be in Linear Pattern
The Outlier Condition Should not have outlier
AFTER THE MODEL
Normality of Error
Homoscedasticity  Variance should be equal in overall spread
Log Transform if slightly curve can use Log Transform to convert perfect linear.
#learnaiwithramisha
The Purpose of Training and Test Set
Original Data Training

Test Set: 20
set : 100 Dataset: 80
#learnaiwithramisha
Training Dataset: 80
Weight
,Bais
Y= 0.3x1+0.4x2+0
Using Training
Data Set
Model
#learnaiwithramisha
Test Set: 20
x1 x2 Y Y= 0.3x1+0.4x2+0 Found using Training Set
+0 =27.95
35.5
If(y>30):
print(“Unfit”)
Else
print(“fit”)
Types of fitting
Steps
Problem Identification
Reg/classification
Check Pattern
Split our dataset into training set and Test set
Model(Training Set)
Validating parameter(Test set)
Assumption
Before the model and After the model(Linear problem)
#learnaiwithramisha
Multiple Linear Regression
Multiple Linear
Regression
#learnaiwithramisha
Multiple Linear Regression
Multiple Linear
Regression
Simple Linear
Regression
#learnaiwithramisha
Assumption for Multiple Linear Regression
BEFORE THE MODEL
The Quantitative Data Condition Should be numbers
The Straight Enough Condition (or “linearity”)  should be in Linear Pattern
The Outlier Condition Should not have outlier
AFTER THE MODEL
Homoscedasticity  Variance should be equal in overall spread
No Multicollinearity  Should not have Multicollinearity
#learnaiwithramisha
Algorithm
NON-LINEAR ALGORITHM
Polynomial Algorithm
LINEAR ALGORITHM
Support Vector Machine
Simple Linear Algorithm
Decision Tree
Multiple Linear Algorithm
Random Forest
KNN
Naive's Bayes
#learnaiwithramisha
Problem Statement of Non-Linear Algorithm
Finding Truth
#learnaiwithramisha
Types of Fitting| Over fitting, Under fitting, well fitting
#learnaiwithramisha
Polynomial Regression
Polynomial
Regression
#learnaiwithramisha
Ccomparison
Multiple
Linear
Regression
Simple
Linear
Regression
Polyno
mial
Regressi
on
#learnaiwithramisha
Assumption for Polynomial Regression
The Quantitative Data
The Outlier
Data Spread should be in Curve
#learnaiwithramisha
Image source: https://blog.statsbot.co/support-vector-machines-tutorial-c1618e635e93

#learnaiwithramisha
Image source: https://blog.statsbot.co/support-vector-machines-tutorial-c1618e635e93 @hope_artificial_intelligence
#learnaiwithramisha

#learnaiwithramisha

What if closer data point Exists?

#learnaiwithramisha

What if closer data point Exists?
#learnaiwithramisha

Non- Separable Dataset
#learnaiwithramisha

Non- Separable Dataset- Three Dimensional
#learnaiwithramisha

Non- Separable Dataset- Three Dimensional- One Plane
#learnaiwithramisha

Non- Separable Dataset- Three Dimensional -3 Plane

#learnaiwithramisha

Non- Separable Dataset- Three Dimensional-3 Plane
#learnaiwithramisha
Assumptions for Support Vector Machine
The data spread should be Non-Linear

Pattern
No Outliers
#learnaiwithramisha
Decision Tree
Image Source: https://towardsdatascience.com/random-forest-a-powerful-ensemble-learning-algorithm-2bf132ba639d
#learnaiwithramisha
#learnaiwithramisha
Decision Tree
#learnaiwithramisha
Important Terminology related to Decision Trees

1.Root Node: It represents the entire population or sample and this further gets divided into two or more
homogeneous sets.
2.Splitting: It is a process of dividing a node into two or more sub-nodes.
3.Decision Node: When a sub-node splits into further sub-nodes, then it is called the decision node.
4.Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node.
5.Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say
the opposite process of splitting.
6.Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-tree.
7.Parent and Child Node: A node, which is divided into sub-nodes is called a parent node of sub-nodes
whereas sub-nodes are the child of a parent node.
#learnaiwithramisha
How to select the best variable from the dataset for Root Node
Entropy, Information Gini index, Gain Ratio,

gain,
Reduction in
Variance
Entropy
If ,
Entropy is larger Randomness is high  perfectly will not able to
predict and Vice versa
#learnaiwithramisha
Entropy
ID3 follows the rule — A branch with an entropy of zero is a

leaf node and A branch with entropy more than zero needs
further splitting.
#learnaiwithramisha
Information
gain
Constructing a decision tree is all about finding an attribute that returns the highest
information gain and the smallest entropy.
#learnaiwithramisha
Information
gain
Where “before” is the dataset before the split, K is the number of subsets generated by the split, and
(j, after) is subset j after the split.
#learnaiwithramisha
Gini index,
You can understand the Gini index as a cost function used to evaluate splits in the dataset.
It is calculated by subtracting the sum of the squared probabilities of each class from one.
It favors larger partitions and easy to implement whereas information gain favors smaller partitions with
distinct values.
#learnaiwithramisha
Gain Ratio,
Gain ratio overcomes the problem with information gain by taking into account the number
of branches that would result before making the split.
It corrects information gain by taking the intrinsic information of a split into account.
#learnaiwithramisha
Reduction in
Variance
This algorithm uses the standard formula of variance to choose the best split.
The split with lower variance is selected as the criteria to split the population:
#learnaiwithramisha
How to avoid/counter Overfitting in Decision Trees?
Pruning Decision Tree
#learnaiwithramisha
How to avoid/counter Overfitting in Decision Trees?
Pruning Decision Tree
#learnaiwithramisha
Decision Tree
#learnaiwithramisha
Important points for Decision Tree
There is Possible of overfitting, because of huge

decision split.
If we change the data in training, there is huge
difference in model
Random Forest
#learnaiwithramisha
Random Forest
Ensemble Learning
Bagging or Bootstrap Aggregation
Random feature selection
#learnaiwithramisha Random Forest
Ensemble Learning
#learnaiwithramisha Random Forest
Ensemble Learning
www.hopelearning.net 80
Hope_Artificial_Intelligence:HAI @hope_artificial_intelligence
#learnaiwithramisha
Random Forest
Bagging or Bootstrap Aggregation
#learnaiwithramisha
Random Forest
Ensemble Learning
#learnaiwithramisha
Random Forest
#learnaiwithramisha
Pure Classification Algorithm
K-Nearest Neighbor
Navies' Bayes
#learnaiwithramisha
K- Nearest Neighbour
#learnaiwithramisha
#learnaiwithramisha
#learnaiwithramisha
Low Bias, Low Variance- Good Model
Low Bias, High Variance- Over Fitting

Model
High Bias, Low Variance- Under Fitting

Model
High Bias, High Variance- Poor Model
Image Source: https://medium.com/30-days-of-machine-learning/day-3-k-nearest-neighbors-and-bias-variance-tradeoff-75f84d515bdb @hope_artificial_intelligence
#learnaiwithramisha
Navies' Bayes
Naïve Bayes is a probabilistic machine learning

algorithm based on the Bayes Theorem.
Conditional probability is a measure of the

probability of an event occurring given that another
event has (by assumption, presumption, assertion, or
evidence) occurred.
#learnaiwithramisha
Navies' Bayes
In simpler terms, Bayes’ Theorem is a way of finding a

probability when we know certain other probabilities.
#learnaiwithramisha
Navies' Bayes
#learnaiwithramisha
Navies' Bayes
Assumptions
Independent  Each variable

should not have any connection
Equal  All the variables are

equally important
#learnaiwithramisha
Navies' Bayes
The variable y is the class variable(stolen?), which represents if the car is stolen or not given the conditions.
Variable X represents the parameters/features.
#learnaiwithramisha
Navies' Bayes
#learnaiwithramisha
Navies' Bayes
Since 0.144 > 0.048, Which means given the features RED
SUV and Domestic, our example gets classified as ’NO’ the car
is not stolen.
#learnaiwithramisha
Navies' Bayes
The zero-frequency problem
#learnaiwithramisha
Types of Algorithm based on spread on data
Linear Algorithm
• Linear
• Multiple Linear
Non-Linear Algorithm(One Problem

Statement-4 models)
• Polynomial
• Support Vector Machine
• Decision Tree
• Random Forest
Validating Parameter For Supervised Learning
#learnaiwithramisha
Classification Confusion Matrix
Regression Minimum Error
#learnaiwithramisha
#learnaiwithramisha
Predicted Class
N= Positive Negative
Positive True Positive False Negative

(TP) (FN)
Actual Class
Negative False Positive True Negative

(FP) (TN)

#learnaiwithramisha
Predicted Class Positive (P) : Observation is positive (for example: is an apple).

Actual Class
(TP) (FN)

(FP) (TN)

#learnaiwithramisha
N= Positive Negative Negative (N) : Observation is not positive (for example: is

not an apple).
Actual Class
(TP) (FN)

(FP) (TN)

#learnaiwithramisha

not an apple).
True Positive (TP) : Observation is positive, and is predicted
Actual Class
(TP) (FN)
to be positive.

(FP) (TN)

#learnaiwithramisha

not an apple).
Actual Class
(TP) (FN)
to be positive.
Negative False Positive True Negative False Negative (FN) : Observation is positive, but is predicted
(FP) (TN) negative.

#learnaiwithramisha

not an apple).
Actual Class
(TP) (FN)
to be positive.
(FP) (TN) negative.
False Positive (FP) : Observation is negative, but is predicted
positive.

#learnaiwithramisha

not an apple).
Actual Class
(TP) (FN)
to be positive.
(FP) (TN) negative.
positive.
True Negative (TN) : Observation is negative, and is predicted
to be negative.

#learnaiwithramisha

not an apple).
Actual Class
(TP) (FN)
Type I Error to be positive.
(FP) (TN) negative.
to be negative.

positive.

#learnaiwithramisha

not an apple).
Actual Class
(TP) (FN)
(FP) (TN) negative.
Type II Error
to be negative.

positive.

#learnaiwithramisha

not an apple).
Actual Class
(TP) (FN)
(FP) (TN) negative.
Type II Error
to be negative.

positive.

#learnaiwithramisha
Predicted Class

Actual Class
(TP) (FN)
Error I
(FP) (TN)
Error II

#learnaiwithramisha
Predicted Class

Actual Class
(TP) (FN)
Error I
(FP) (TN)
Error II

#learnaiwithramisha
Predicted Class

Actual Class
(TP) (FN)
Error I
(FP) (TN)
Error II

#learnaiwithramisha
Predicted Class

Actual Class
(TP) (FN)
Error I
(FP) (TN)
Error II

#learnaiwithramisha
Predicted Class

Actual Class
(TP) (FN)
Error I
(FP) (TN)
Error II

#learnaiwithramisha
Predicted Class

Actual Class
(TP) (FN)
Error I
(FP) (TN)
Error II

#learnaiwithramisha
Algorithms
Algorithm for both Regression Algorithm Classification Algorithm

Regression and Classification
Support Vector Machine Linear Algorithm

Logistic Algorithm
Multiple Linear Navies Bayes

Decision Tree Regression KNN
Random Forest Polynomial Regression
#learnaiwithramisha
Logistic Algorithm

#learnaiwithramisha

#learnaiwithramisha
Unsupervised : Clustering Algorithm
K-Means
Hierarchical

#learnaiwithramisha
K means

#learnaiwithramisha
K means

#learnaiwithramisha
Hierarchical
Agglomerative
• Compute the proximity matrix
• Let each data point be a cluster
• Repeat: Merge the two closest clusters
and update the proximity matrix
• Until only a single cluster remains
Divisive
• Opposite of Agglomerative

#learnaiwithramisha
Hierarchical
Agglomerative

sad
https://towardsdatascience.com/the-complete-guide-to-decisio
n-trees-28a4e3c7be14
https://www.listendata.com/2018/01/linear-regression-in-pyth
on.html
https://www.statsmodels.org/dev/examples/notebooks/genera
ted/regression_diagnostics.html

MachineLearning Algorithm - Hope

Uploaded by

MachineLearning Algorithm - Hope

Uploaded by

#learnaiwithramisha

Do you think Machine Learning is only for

Past data Algorithm Output

Types of Problem Statement in Machine Learning

Take way for Supervised Learning

Requirement should be clear.

Input and Output are well defined.

Types of Problem Statement in Machine Learning

Take way for Unsupervised Learning

We don’t know what we need.

Only Input data, So we can do

Types of Problem Statement in Machine Learning

Past Data/input Data

Take way for Semi Supervised Learning

But Half of the output are not

See the Picture Tell a Story

Supervised Learning Unsupervised Learning Semi Supervised Learning

Regression Classification Clustering

Algorithm for both Regression Algorithm Classification Algorithm

Support Vector Machine Linear Algorithm

Multiple Linear Navies Bayes

Random Forest Polynomial Regression

 Take Away Concepts In Simple Linear Regression

 Why it is Simple Linear Regression?

 How this regression helps for future prediction?

 When to use simple linear regression?

 Why it is Simple Linear Regression?

It has one input(X/Independent) and one output (y/Dependent Variable/Response variable).

It uses the straight line equation y= mX+c

X= Input( if input changes, then output will also change accordingly)

2 Initial Value(b)/Minimum value/ Origin

Actual Value I.V

10 Where, i = Observation point

8 n= number of Observation point

8 SST = Actual Value

Graphs with Equation

 It has only positive value  It may have negative value.

Concept Name Formula Inference

Sum of Square Total(SST) SST= SST maximum, poor the Model

R-Squared (R2) R2 = SSR/SST If, R2 = nearly to 1, then built model has

(Note: R2 exists between 0 and 1)

Assumption for Linear Regression

BEFORE THE MODEL

The Quantitative Data Condition Should be numbers

The Straight Enough Condition (or “linearity”)  should be in Linear Pattern

The Outlier Condition Should not have outlier

AFTER THE MODEL

Homoscedasticity  Variance should be equal in overall spread

Original Data Training

x1 x2 Y Y= 0.3x1+0.4x2+0 Found using Training Set

Split our dataset into training set and Test set

Validating parameter(Test set)

Before the model and After the model(Linear problem)

Multiple Linear Regression

Multiple Linear Regression

Assumption for Multiple Linear Regression

BEFORE THE MODEL

The Quantitative Data Condition Should be numbers

The Straight Enough Condition (or “linearity”)  should be in Linear Pattern

The Outlier Condition Should not have outlier

AFTER THE MODEL

Homoscedasticity  Variance should be equal in overall spread

No Multicollinearity  Should not have Multicollinearity

Problem Statement of Non-Linear Algorithm

Assumption for Polynomial Regression

The Quantitative Data

Data Spread should be in Curve

Support Vector Machine

Image source: https://blog.statsbot.co/support-vector-machines-tutorial-c1618e635e93

Support Vector Machine

Support Vector Machine

Image source: https://blog.statsbot.co/support-vector-machines-tutorial-c1618e635e93

Support Vector Machine

Image source: https://blog.statsbot.co/support-vector-machines-tutorial-c1618e635e93

Support Vector Machine

Support Vector Machine