Introduction to Machine
Learning
Dr. Saketh Athkuri
What we have seen so far
Statistics
• Mean
• Median
• Mode
• IQR
• CI
• Hypothesis testing
• 𝑡-test
• 𝑧-test
• 𝜒 2 -test
• 𝐹-test
10-09-2024 10:35 AM 2
ML definition
A computer program is said to learn from Experience E with respect to
task T and performance measure P, if its performance at task T as
measured by P improves with experience E.
Example: Alan Turing, Loan approval example
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
2. https://www.wordstream.com/blog/ws/2017/07/28/machine-learning-applications
10-09-2024 10:35 AM 3
Artificial Intelligence
Enigma code
Data RULES Output
10-09-2024 10:35 AM 4
Machine Learning
Features
Black box Rules
Output
10-09-2024 10:35 AM 5
Deep Learning
Raw data
Black box Rules
Output
10-09-2024 10:35 AM 6
AI and its fields
Artificial Intelligence
Machine Learning
Deep Learning
10-09-2024 10:35 AM 7
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 8
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 9
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 10
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box
Rul
es
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 11
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box Rules
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 12
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box Rules
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 13
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box Rules
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 14
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box Rules
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 15
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box Rules
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 16
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box Rules
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 17
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Rul
Black box
es
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 18
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box Rules
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 19
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box Rules
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 20
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.
Black box Rules
1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York
10-09-2024 10:35 AM 21
ML overview
Machine
Learning
10-09-2024 10:35 AM 23
ML overview
Regression Classification
•Linear regression •Logistic regression
•Forecasting •Naive-Bayes
•Decision Trees
•SVM, knn
Supervised •Decision trees
•SVM, knn
•Ensemble techniques •Ensemble techniques
10-09-2024 10:35 AM 24
ML overview
Machine
Learning
10-09-2024 10:35 AM 25
ML overview
Dimension
ality
reduction
(PCA)
Clustering
Unsupervised •k-means
•Hierarchical
•DB-Scan
Association
rules
10-09-2024 10:35 AM 26
ML overview
Machine
Learning
10-09-2024 10:35 AM 27
ML overview
Reinforcement
10-09-2024 10:35 AM 28
ML overview
Machine
Learning
10-09-2024 10:35 AM 29
ML overview
10-09-2024 10:35 AM 30
ML overview
Optimization
10-09-2024 10:35 AM 31
Machine Learning – Experience, E
Input 𝒙 ∈ ℝ𝑝 p: number of features in the data
p-tuple
𝑥1 𝑥2 𝑥3 ⋯ 𝑥𝑝
𝒙 = (𝑥1 , 𝑥2 , 𝑥3 , … 𝑥𝑝 ) 𝑳𝒂𝒃𝒆𝒍: 𝒚
𝒙𝟏 𝑥11 𝑥12 𝑥13 ⋯ 𝑥1𝑝
𝒙𝟏 = (𝑥11 , 𝑥12 , 𝑥13 , … 𝑥1𝑝 ) 𝒚𝟏 𝒙𝟐 𝑥21 𝑥22 𝑥23 ⋯ 𝑥2𝑝
⋮ ⋮ ⋮ ⋯ ⋮
𝒙𝒏 𝑥𝑛1 𝑥𝑛2 𝑥𝑛3 ⋯ 𝑥𝑛𝑝
𝒙𝒏 = (𝑥𝑛1 , 𝑥𝑛2 , 𝑥𝑛3 , … 𝑥𝑛𝑝 ) 𝒚𝒏
10-09-2024 10:35 AM 32
Machine Learning – Task, T
• Predict or forecast a value • Group objects
• Classify an object in to one of ‘n’ • Identify areas of interest in an
given categories image – segmentation
• Anomaly detection • Fastest route between two cities
• Transcription • Combination of stocks with
• Translation maximum ROI
• Synthesis of a new exemplar • Predict the next product the
customer will buy
• Determination of missing value –
imputation
• Data Cleaning – Denoising
• Estimation of probability mass
function or density
10-09-2024 10:35 AM 33
What is decision making?
• Decision making is the process of identifying and selecting a
course of action among several alternatives to achieve a
desired outcome.
• Decision making is essential for navigating
uncertainties and achieving
organizational goals.
Types of decision making
1. Certainty
2. Risk
3. Uncertainty
Image Source: Link
Types of decision making
1. Certainty
2. Risk
3. Uncertainty
Image Source: Link
Types of decision making
1. Certainty
2. Risk
3. Uncertainty
Image Source: Link
Types of decision making
1. Certainty
2. Risk
3. Uncertainty
Image Source: Link
NIFTY50
Types of decision making
1. Certainty
Mid cap
2. Risk
Small cap
3. Uncertainty
Image Source: Link
Types of decision making
1. Certainty
2. Risk
3. Uncertainty
Image Source: Link
Types of decision making
1. Certainty
2. Risk
3. Uncertainty
Image Source: Link
Supervised Learning
Linear regression and logistic regression
Supervised learning
• Labelled data – target column or dependent variable
• Labelled data can be numerical or categorical
• What is numerical data – Eg: Age
• What is categorical data – Eg: Type
• Generally, model assumes some relationship. Eg: Linear and logistic
regression
• Applications (identify the right applications)
• Image Classification, Spam Detection, Customer Segmentation, Network
Anomaly, Fraud Detection, House Price Prediction, Handwriting Recognition
10-09-2024 10:35 AM 43
Metrics
Regression Classification
MAE Accuracy
MSE Recall
RMSE Precision
MAPE F1-score
R-square
10-09-2024 10:35 AM 44
Metrics
Regression Classification
MAE Accuracy
MSE Recall
RMSE Precision
MAPE F1-score
R-square
10-09-2024 10:35 AM 45
Linear regression
Simple and Multiple
10-09-2024 10:35 AM 46
MPG – application in automobile sector
• Suppose you want to launch a new car model and wants to find
mileage of it.
• How to find it?
10-09-2024 10:35 AM 47
Dataset
10-09-2024 10:35 AM 48
Dataset
10-09-2024 10:35 AM 49
Dataset
10-09-2024 10:35 AM 50
Dataset
y X
10-09-2024 10:35 AM 51
Dataset
y = 𝛽1 X + 𝛽0
10-09-2024 10:35 AM 52
Model summary (R)
10-09-2024 10:35 AM 53
Model summary (R)
10-09-2024 10:35 AM 54
Model summary (R)
10-09-2024 10:35 AM 55
Model summary (R)
10-09-2024 10:35 AM 56
Model summary (R)
10-09-2024 10:35 AM 57
Model summary (R)
10-09-2024 10:35 AM 58
Model summary (R)
𝑚𝑝𝑔 = −0.0077(𝑤𝑒𝑖𝑔ℎ𝑡) + 46.31
10-09-2024 10:35 AM 59
Model summary (R)
𝑚𝑝𝑔 = −0.0077(𝑤𝑒𝑖𝑔ℎ𝑡) + 46.31
10-09-2024 10:35 AM 60
Model summary (R) What is p-value?
𝑚𝑝𝑔 = −0.0077(𝑤𝑒𝑖𝑔ℎ𝑡) + 46.31
10-09-2024 10:35 AM 61
Model summary (R)
10-09-2024 10:35 AM 62
Multiple
𝑦ො linear regression
= 𝛽𝑖 X𝑖 + 𝛽0
10-09-2024 10:35 AM 63
Model summary (MLR)
10-09-2024 10:35 AM 64
Model summary (MLR)
10-09-2024 10:35 AM 65
Model summary (MLR)
10-09-2024 10:35 AM 66
Model summary (MLR)
10-09-2024 10:35 AM 67
Model summary (MLR)
10-09-2024 10:35 AM 68
Model summary (MLR)
10-09-2024 10:35 AM 69
Model summary (MLR)
10-09-2024 10:35 AM 70
Model summary (MLR)
10-09-2024 10:35 AM 71
Mutual fund manager skill
10-09-2024 10:35 AM 72
Mutual fund manager skill
10-09-2024 10:35 AM 73
Mutual fund manager skill
10-09-2024 10:35 AM 74
Mutual fund manager skill
10-09-2024 10:35 AM 75
Mutual fund manager skill
10-09-2024 10:35 AM 76
Application
10-09-2024 10:35 AM 77
Application
10-09-2024 10:35 AM 78
𝜶, 𝜷 values
10-09-2024 10:35 AM 79
𝜶, 𝜷 values
10-09-2024 10:35 AM 80
𝜶, 𝜷 values
10-09-2024 10:35 AM 81
𝜶, 𝜷 values
10-09-2024 10:35 AM 82
How to find 𝜷𝒊 ?
𝑦ො = 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0
2
SSE: 𝑦 − 𝑦ො
Minimize SSE to get 𝛽s.
2
Obj function: 𝑦 − 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0
10-09-2024 10:35 AM 83
Visualization
10-09-2024 10:35 AM 84
Visualization
10-09-2024 10:35 AM 85
Visualization
10-09-2024 10:35 AM 86
Assumptions
1.Linearity: The relationship between independent and dependent
variables is linear. This can be checked using scatter plots or
residual plots.
2.Independence: Observations are independent of each other. This
assumption is often verified through knowledge of the data collection
and experiment design.
3.Homoscedasticity: The variance of the residuals (or "errors")
should be constant across all levels of the independent variables. A
plot of residuals vs. predicted values can help check this.
4.Normality of Errors: The residuals (or "errors") should be
approximately normally distributed. This can be checked using
histograms or QQ-plots of residuals
10-09-2024 10:35 AM 87
Residual plots
Linearity Normality
Homoscedasticity
10-09-2024 10:35 AM 88
Residual plots
Linearit Normality
Homoscedasticity
10-09-2024 10:35 AM 89
Residual plots
Linearity Normality
Homoscedasticity
10-09-2024 10:35 AM 90
Residual plots
Linearity
Normality
Homoscedasticity
10-09-2024 10:35 AM 91
Residual plots
Linearity Normality
Homoscedasticity
10-09-2024 10:35 AM 92
Residual plots
Linearity Normality
Homoscedasticity
10-09-2024 10:35 AM 93
Residual plots
Linearity Normality
Homoscedasticity
10-09-2024 10:35 AM 94
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.
• High residual points – points having high residuals can also be
influential points.
• Cook's Distance: Cook's Distance is a measure that combines
leverage and residual to identify influential points. It measures the
effect of deleting a given observation.
10-09-2024 10:35 AM 95
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.
• High residual points – points having high residuals can also be
influential points.
• Cook's Distance: Cook's Distance is a measure that combines
leverage and residual to identify influential points. It measures the
effect of deleting a given observation.
10-09-2024 10:35 AM 96
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.
• High residual points – points having high residuals can also be
influential points.
• Cook's Distance: Cook's Distance is a measure that combines
leverage and residual to identify influential points. It measures the
effect of deleting a given observation.
10-09-2024 10:35 AM 97
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.
• High residual points – points having high residuals can also be
influential points.
• Cook's Distance: Cook's Distance is a measure that combines
leverage and residual to identify influential points. It measures the
effect of deleting a given observation.
10-09-2024 10:35 AM 98
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.
• High residual points – points having high residuals can also be
influential points.
• Cook's Distance: Cook's Distance is a measure that combines
leverage and residual to identify influential points. It measures the
effect of deleting a given observation.
10-09-2024 10:35 AM 99
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.
• High residual points – points having high residuals can also be
influential points.
• Cook's Distance: Cook's Distance is a measure that combines
leverage and residual to identify influential points. It measures the
effect of deleting a given observation.
10-09-2024 10:35 AM 100
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.
• High residual points – points having high residuals can also be
influential points.
• Cook's Distance: Cook's Distance is a measure that combines
leverage and residual to identify influential points. It measures the
effect of deleting a given observation.
10-09-2024 10:35 AM 101
Python code using statsmodels
import statsmodels.api as sm
# Define independent variables (X) and dependent variable (y)
X = df[['horsepower', 'weight']]
X = sm.add_constant(X) # Add a constant column for the
intercept
y = df['mpg']
# Fit the linear regression model
model_statsmodels = sm.OLS(y, X).fit()
# Print the summary of the regression
print(model_statsmodels.summary())
10-09-2024 10:35 AM 102
Python code using sklearn
from sklearn.linear_model import LinearRegression
# Define independent variables (X) and dependent variable (y)
X = df[['horsepower', 'weight']]
y = df['mpg']
# Initialize and fit the linear regression model
model_sklearn = LinearRegression().fit(X, y)
# Print the coefficients and intercept
print("Intercept:", model_sklearn.intercept_)
print("Coefficients:", model_sklearn.coef_)
10-09-2024 10:35 AM 103
Multicollinearity
What is multi-collinearity?
Variance Inflation Factor (VIF):
1
𝑉𝐼𝐹 𝑋𝑖 =
1 − 𝑅𝑖2
In practice, a VIF value exceeding 5 or 10 suggests that
multicollinearity may be a problem and should be further investigated.
10-09-2024 10:35 AM 105
Numerical attributes
10-09-2024 10:35 AM 106
Handling Categorical Attributes
Qualification
_btech
_phd
_mtech
10Btech
01Mtech
01Phd
01Phd
01Mtech
01Mtech
10-09-2024 10:35 AM 107
Handling Categorical Attributes
Qualification Qualification
_phd
_mtech _btech
0Btech 1
1Mtech
0 0
0Phd
1 0
0Phd
1 0
1Mtech
0 0
1Mtech
0 0
10-09-2024 10:35 AM 108
Handling Categorical Attributes
Qualification Qualification Qualification
_phd _btech _mtech
0Btech 1 0
0Mtech 0 1
1Phd 0 0
1Phd 0 0
0Mtech 0 1
0Mtech 0 1
10-09-2024 10:35 AM 109
Handling Categorical Attributes
Qualification Qualification Qualification Qualification
_btech _mtech _phd
Btech 1 0 0
Mtech 0 1 0
Phd 0 0 1
Phd 0 0 1
Mtech 0 1 0
Mtech 0 1 0
10-09-2024 10:35 AM 110
Handling Categorical Attributes
Qualification Qualification Qualification Qualification
_btech _mtech _phd
Btech 1 0 0
Mtech 0 1 0
Phd 0 0 1
Phd 0 0 1
Mtech 0 1 0
Mtech 0 1 0
10-09-2024 10:35 AM 111
Transformations – handle non-linear data
10-09-2024 10:35 AM 112
Transformations – handle non-linear data
10-09-2024 10:35 AM 113
Transformations – handle non-linear data
10-09-2024 10:35 AM 114
Box-Cox transformations
• Learn it on your own
10-09-2024 10:35 AM 115
Regularization
• Bias – Variance tradeoff
10-09-2024 10:35 AM 116
Regularization
• Bias – Variance tradeoff
Data
10-09-2024 10:35 AM 117
Regularization
• Bias – Variance tradeoff
Data
10-09-2024 10:35 AM 118
Regularization
• Bias – Variance tradeoff
Data
10-09-2024 10:35 AM 119
Regularization
• Bias – Variance tradeoff
Data
10-09-2024 10:35 AM 120
Regularization
• Bias – Variance tradeoff
Data Unseen data
10-09-2024 10:35 AM 121
Regularization
• Bias – Variance tradeoff
Data Unseen data
10-09-2024 10:35 AM 122
Regularization
• Bias – Variance tradeoff
Data Unseen data
10-09-2024 10:35 AM 123
Regularization
• Bias – Variance tradeoff
Data Unseen data
10-09-2024 10:35 AM 124
Regularization
• Bias – Variance tradeoff
Data Unseen data
10-09-2024 10:35 AM 125
Regularization
• Bias – Variance tradeoff
Data Unseen data
10-09-2024 10:35 AM 126
Regularization
• Bias – Variance tradeoff
Data Unseen data
10-09-2024 10:35 AM 127
Regularization
So, we understand that we should reduce model complexity.
What is model complexity?
𝑦ො = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝛽4 𝑥4 + 𝛽5 𝑥5 + 𝛽6 𝑥6 + ⋯
10-09-2024 10:35 AM 129
Regularization
So, we understand that we should reduce model complexity.
What is model complexity?
𝑦ො = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝛽4 𝑥4 + 𝛽5 𝑥5 + 𝛽6 𝑥6 + ⋯
How to reduce the complexity now?
10-09-2024 10:35 AM 130
Regularization
Loss function:
2
𝑦 − 𝑦ො
or
(𝑦 − 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝛽4 𝑥4 + 𝛽5 𝑥5
2
+ 𝛽6 𝑥6 + ⋯ )
10-09-2024 10:35 AM 131
Regularization
Loss function:
2
𝑦 − 𝑦ො + 𝜆|𝛽𝑖 |
What if 𝜆 is high?
10-09-2024 10:35 AM 132
Regularization
Loss function:
2
𝑒𝑟𝑟𝑜𝑟 = 𝑦 − 𝑦ො + 𝜆|𝛽𝑖 |
What if 𝜆 is high?
𝜆 𝛽𝑖 𝑒𝑟𝑟𝑜𝑟
10-09-2024 10:35 AM 133
Regularization
Loss function:
2
𝑒𝑟𝑟𝑜𝑟 = 𝑦 − 𝑦ො + 𝜆|𝛽𝑖 |
What if 𝜆 is high?
𝜆 𝛽𝑖 𝑒𝑟𝑟𝑜𝑟
10-09-2024 10:35 AM 134
Regularization
Loss function:
2
𝑒𝑟𝑟𝑜𝑟 = 𝑦 − 𝑦ො + 𝜆|𝛽𝑖 |
What if 𝜆 is high?
𝜆 𝛽𝑖 𝑒𝑟𝑟𝑜𝑟
10-09-2024 10:35 AM 135
Regularization
Loss function:
2
𝑒𝑟𝑟𝑜𝑟 = 𝑦 − 𝑦ො + 𝜆|𝛽𝑖 |
What if 𝜆 is high?
𝜆 𝛽𝑖 𝑒𝑟𝑟𝑜𝑟
10-09-2024 10:35 AM 136
Applications
Index tracker
10-09-2024 10:35 AM 137