linear regression

Linear Regression is a supervised learning algorithm used for predicting continuous outcomes by modeling the relationship between dependent and independent variables. It includes types such as Simple and Multiple Linear Regression, and relies on minimizing the sum of squared errors to find the best-fit line. The document also discusses the application of Linear Regression in predicting house prices, emphasizing the importance of various factors and methodologies in building a predictive model.

Uploaded by

mungaijames6303

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

linear regression

Uploaded by

mungaijames6303

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 130

Linear Regression

The Foundation of Predictive Modeling

Introduction

 Linear Regression is a supervised learning algorithm used to predict continuous

outcomes by modeling the relationship between dependent and independent variables.

➢ Common applications: Predicting house prices, stock trends, sales forecasting, etc.

➢ On Larger Note in LR we always looking for a best fit line.

Types of Linear Regression:
➢ Simple Linear Regression

➢ Multiple Linear Regression:

Assumptions of Linear Regression:
Additional Considerations:
Journey of Regression from Stats to ML:
➢ Linear Regression models the relationship between the response

variable (also known as the dependent, target, or result variable, denoted

as y) and the regression coefficients (denoted as βi, wi).

➢ The relationship is assumed to be linear. This means the output y can be

expressed as a linear combination of the input features x and the

coefficients.
➢Regression Coefficients:
✓These coefficients (βi, wi) are the weights assigned to each input feature x.
✓They determine how much each input feature contributes to the output y.
Graphical Representation:
 Scatter Plot of Data Points: We'll plot the given data points, where each point
represents a pair of weight and height values.
 Best Fit Line: We'll draw the line that best fits the data points according to the linear
regression model. This line minimizes the sum of the squared errors between the actual
data points and the line.
 Residuals (Errors): We'll show the vertical distances (residuals) between the data points
and the regression line, highlighting the concept of minimizing these distances.
Model Parameters:
 Slope (w1): let it be 0.361 (approximately)
 This means that for every unit increase in weight, the height is expected
to increase by about 0.361 units.
 Intercept (w0): let it be 9.11 (approximately)
 This is the height when the weight is zero. While it might not be
meaningful in a practical context, it is essential for defining the
regression line.
Geometric Intuition:
 The best fit line attempts to capture the linear relationship between weight
and height. The slope indicates the direction and steepness of this
relationship.
 Minimizing the residuals ensures that the line is as close as possible to all the
data points, providing the best possible predictions.
Finding a line to best fit the data points using Ordinary
Least Squares (OLS) regression
Concept of residuals and squared errors in the
context of linear regression.
Key Points:
➢ The best-fit line is determined by minimizing the sum of the squared vertical

distances between the actual data points and the predicted values on the line.

➢ The residuals show how far off the predictions are from the actual data points.
➢ Why we use
square of
error?
➢ In linear regression,
we often use the
square of the errors,
rather than just the
errors themselves, to
measure how well the
model fits the data.
➢ This is called the Sum
of Squared Errors
(SSE).
Why Not Use Absolute Errors?
R squared: Coefficient of Determination
 Ordinary Least Squares(OLS): Work on Best fit line
 Gradient Descent: Work on the concept of Reduce the error
Math's to Find Slope and Intercept
➢ Identification of significant variables:
➢ It can be done during Exploratory Data Analysis (EDA)
➢ As well as during model building.
GRADIENT DESCENT
APPROACH:
 Gradient Descent is a very generic optimization algorithm capable of finding optimal solutions to a wide range of
problems. The general idea of Gradient Descent is to tweak parameters iteratively in order to minimize a cost
function.
 Suppose you are lost in the mountains in a dense fog; you can only feel the slope of the ground below your feet.
 A good strategy to get to the bottom of the valley quickly is to go downhill in the direction of the steepest slope.
 This is exactly what Gradient Descent does:
 it measures the local gradient of the error function with regards to the parameter vector θ, and it goes in the
direction of descending gradient.
 Once the gradient is zero, you have reached a minimum!
 So, you start by filling θ with random values (this is called random initialization), and then you improve it
gradually, taking one baby step at a time, each step attempting to decrease the cost function (e.g., the MSE),
until the algorithm converges to a minimum

What is Gradient
Descent
➢ An important parameter in Gradient
Descent is the size of the steps,
determined by the learning rate
hyperparameter.

➢ If the learning rate is too small, then the

algorithm will have to go through many
iterations to converge, which will take a
long time

➢ On the other hand, if the learning rate is

too high, you might jump across the valley
and end up on the other side, possibly even
higher up than you were before.

➢ This might make the algorithm diverge,

with larger and larger values, failing to find
a good solution
➢ The two main challenges with Gradient
Descent: if the random initialization starts
the algorithm on the left, then it will
converge to a local minimum, which is not
as good as the global minimum.

➢ If it starts on the right, then it will take a

very long time to cross the plateau, and if
you stop too early you will never reach
the global minimum.

➢ Fortunately, the MSE cost function for a

Linear Regression model happens to be a
convex function, which means that if you
pick any two points on the curve, the line
segment joining them never crosses the
curve.

➢ This implies that there are no local

minima, just one global minimum. It is
also a continuous function with a slope
that never changes abruptly.4
➢ Where derivative of loss or cost with weight is called slope.
➢ Its direction decide in which direction we need to move to reach a point where loss is minimum.
➢ The derivative of loss wrt ndim of vector is called gradient.
➢ Where ndim vector is called a tensor.
➢ In calculus derivative of tensor is referred as tensor.
➢ In machine learning, data with n number of features is represented as a tensor.
Derivative
Gradient Descent: Types
Stochastic Gradient Descent (SGD)
Batch Gradient Descent
Mini-Batch Gradient Descent
Linear Regression
and
optimization
Linear Regression And Optimization

➢ Linear regression aims to minimize the squared loss, which measures the
discrepancy between the actual and predicted values.

➢ The squared loss function is fundamental in regression analysis for evaluating

the performance of a model.
Overfitting,
Under fitting,
and
Best Fit

➢ Threshold Accuracy:
➢ It’s indicated that an accuracy
threshold of 70-95% (or 0.7-0.95) is
desired.
➢ This is the target range for acceptable
model performance.
Regularization
Types of Regularization:
Types of Regularization
Types of Regularization
Application and Interpretation:
➢ When evaluating a linear regression model, several error metrics
help determine the model's performance.
➢ Each serves a slightly different purpose.
➢ The order of accuracy typically depends on the sensitivity of the
metric to outliers and the emphasis on specific error magnitudes.
➢ Here is a brief overview of the key error metrics, their order, and
when to use them:

Evaluation of a Regression Model:

Variance Inflation
Factor (VIF):
Durbin-Watson Test:
Train-Test Split
Cross-Validation
Combining Cross-Validation with a
Holdout Set
EXAMPLE
EXAMPLE:
➢ R-squared (R2): ≈0.964
➢ This R2 value indicates
that approximately
96.43% of the variance
in salary can be
explained by the linear
relationship with
experience in this
model.
➢ Introduction
The real estate market is influenced by various factors, including income levels, house age, number of rooms,
number of bedrooms, and population density. Understanding how these factors affect house prices can provide
valuable insights for buyers, sellers, and real estate professionals. In this project, we aim to develop a predictive
model to estimate house prices based on various features in the USAHousing dataset.
➢ Dataset Description
The USAHousing dataset contains information on various attributes related to houses in different areas. The
features included in the dataset are:
▪ Avg. Area Income: The average income of residents in the area.
▪ Avg. Area House Age: The average age of houses in the area.
▪ Avg. Area Number of Rooms: The average number of rooms in houses in the area.
▪ Avg. Area Number of Bedrooms: The average number of bedrooms in houses in the area.
▪ Area Population: The population of the area.
▪ Price: The price of the house.
▪ Address: The address of the house (considered as a non-significant variable and will be excluded from the
model).

Case Study: USAHOUSING PRICE PREDICTION

➢ The primary objective of this project is to build a
robust predictive model that can accurately estimate
the price of a house based on the following
independent variables:
1.Avg. Area Income
2.Avg. Area House Age
3.Avg. Area Number of Rooms
4.Avg. Area Number of Bedrooms
5.Area Population

Objective
Methodology
METHODOLOGY
Conclusion

▪ Predicting house prices is a complex task that involves

understanding various factors that influence the real estate
market.
▪ By leveraging machine learning techniques, we aim to build a
reliable model that can provide accurate price estimates and
valuable insights into the housing market.

Gradient Descent
No ratings yet
Gradient Descent
17 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Linear_Regression (1)
No ratings yet
Linear_Regression (1)
35 pages
Lecture 09_02.09.2024_Regression-01
No ratings yet
Lecture 09_02.09.2024_Regression-01
62 pages
Regression
No ratings yet
Regression
11 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
PDA Unit-3(Full Unit)
No ratings yet
PDA Unit-3(Full Unit)
61 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
33 pages
Practical 6 Multiple Linear Regression Using SPSS
No ratings yet
Practical 6 Multiple Linear Regression Using SPSS
29 pages
Accuracy Assessment and Confusion Matrix
No ratings yet
Accuracy Assessment and Confusion Matrix
23 pages
Module 3
No ratings yet
Module 3
35 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds
No ratings yet
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds
12 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Lecture 10_04.09.2024_Regression-02 Lecture Slides
No ratings yet
Lecture 10_04.09.2024_Regression-02 Lecture Slides
61 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
LLM ML Interview Q
No ratings yet
LLM ML Interview Q
43 pages
ML Lec-6
No ratings yet
ML Lec-6
16 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
No ratings yet
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
199 pages
Machine Learning Algorithns - Unit3
No ratings yet
Machine Learning Algorithns - Unit3
124 pages
Regression
No ratings yet
Regression
35 pages
Regression Techniques
No ratings yet
Regression Techniques
14 pages
ML-U2-Regression
No ratings yet
ML-U2-Regression
20 pages
Regression PPT
No ratings yet
Regression PPT
21 pages
2-Linear Regression
No ratings yet
2-Linear Regression
31 pages
Regression
No ratings yet
Regression
45 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Ai - W3L6
No ratings yet
Ai - W3L6
29 pages
UNIT 3 Regression
No ratings yet
UNIT 3 Regression
5 pages
CH 14
No ratings yet
CH 14
12 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Unit2-Regression NGP
No ratings yet
Unit2-Regression NGP
81 pages
MLF_Week_4_Notes_by_Manisha_Pal
No ratings yet
MLF_Week_4_Notes_by_Manisha_Pal
13 pages
Copie de Executive Summary of Marketing Plan by Slidesgo 1
No ratings yet
Copie de Executive Summary of Marketing Plan by Slidesgo 1
50 pages
CH2. Simple Linear Regression 2023
No ratings yet
CH2. Simple Linear Regression 2023
100 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Module 3
No ratings yet
Module 3
27 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
Other Questions Notes
No ratings yet
Other Questions Notes
6 pages
Unit 2linear Regression Bayesian Learning
No ratings yet
Unit 2linear Regression Bayesian Learning
12 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
LECTURE Regression
No ratings yet
LECTURE Regression
12 pages
Module 3 - Regression and Correlation Analysis
No ratings yet
Module 3 - Regression and Correlation Analysis
54 pages
3-Linear Regreesion-Assumptions
No ratings yet
3-Linear Regreesion-Assumptions
28 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
Machine Learning Algorithm
100% (2)
Machine Learning Algorithm
20 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
Linear Regression: Level:4 Department: IT, Security
No ratings yet
Linear Regression: Level:4 Department: IT, Security
35 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
linearregression-190924053948
No ratings yet
linearregression-190924053948
10 pages
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Canny Edge Detector: Unveiling the Art of Visual Perception
From Everand
Canny Edge Detector: Unveiling the Art of Visual Perception
Fouad Sabry
No ratings yet
[P._McCullagh,_John_A._Nelder]_Generalized_Linear_(b-ok.xyz)
No ratings yet
[P._McCullagh,_John_A._Nelder]_Generalized_Linear_(b-ok.xyz)
274 pages
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
No ratings yet
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
4 pages
An Investigation of Recent Deconvolution Methods For Well-Test Data Analysis
No ratings yet
An Investigation of Recent Deconvolution Methods For Well-Test Data Analysis
22 pages
Abstract Book Acex2010
No ratings yet
Abstract Book Acex2010
225 pages
049 Stat 326 Regression Final Paper
No ratings yet
049 Stat 326 Regression Final Paper
17 pages
Maths Unit1 Question Bank
No ratings yet
Maths Unit1 Question Bank
8 pages
Rohini 73149042113
No ratings yet
Rohini 73149042113
11 pages
Ge 3 (Mathematics in The Modern World) : College of Teacher Education
No ratings yet
Ge 3 (Mathematics in The Modern World) : College of Teacher Education
26 pages
STAT291
No ratings yet
STAT291
9 pages
Linear Regression-2: Prof. Asim Tewari IIT Bombay
No ratings yet
Linear Regression-2: Prof. Asim Tewari IIT Bombay
19 pages
Long Range Planning: Theo K. Dijkstra
No ratings yet
Long Range Planning: Theo K. Dijkstra
8 pages
0531實習課
No ratings yet
0531實習課
22 pages
Regression
No ratings yet
Regression
34 pages
SET-01_SOCS_ESE-MAY23_B.Tech%20%28CSE-H%2bN.H%29_VI_CSAI3011_Pattern%20Recognition%20and%20Anomaly%2
No ratings yet
SET-01_SOCS_ESE-MAY23_B.Tech%20%28CSE-H%2bN.H%29_VI_CSAI3011_Pattern%20Recognition%20and%20Anomaly%2
2 pages
Inversion Techniques Applied To Resistivity Invers
No ratings yet
Inversion Techniques Applied To Resistivity Invers
19 pages
Effect of Dispersion and Deadend Pore Volume in Miscible Flooding
No ratings yet
Effect of Dispersion and Deadend Pore Volume in Miscible Flooding
9 pages
Bundle Adjustment - A Modern Synthesis: Bill - Triggs@
No ratings yet
Bundle Adjustment - A Modern Synthesis: Bill - Triggs@
75 pages
Magcalib
No ratings yet
Magcalib
10 pages
Suggested Answers (Chapter 6)
No ratings yet
Suggested Answers (Chapter 6)
3 pages
UT Dallas Syllabus For Ee6343.001.07f Taught by Naofal Al-Dhahir (Nxa028000)
No ratings yet
UT Dallas Syllabus For Ee6343.001.07f Taught by Naofal Al-Dhahir (Nxa028000)
6 pages
Phosphorus Prediction
No ratings yet
Phosphorus Prediction
13 pages
Multiple-Choice Test Linear Regression Regression: y X y X y X
No ratings yet
Multiple-Choice Test Linear Regression Regression: y X y X y X
2 pages
AI_UNIT_3
No ratings yet
AI_UNIT_3
30 pages
GR 12 WC Winelands District Maths P2 Sep 2020
No ratings yet
GR 12 WC Winelands District Maths P2 Sep 2020
10 pages
Ebook - Econometrics Handbook PDF
No ratings yet
Ebook - Econometrics Handbook PDF
317 pages
CE206 Curvefitting Interpolation 4
No ratings yet
CE206 Curvefitting Interpolation 4
20 pages
LLM Notes On Research Methodology by Mub
No ratings yet
LLM Notes On Research Methodology by Mub
19 pages
Course Syllabus For ICT 2020-2021 FV
No ratings yet
Course Syllabus For ICT 2020-2021 FV
53 pages
Identification and Model Reduction Ofmultivariable Continuous Systems Via A Block-Pulse Functions Scheme
No ratings yet
Identification and Model Reduction Ofmultivariable Continuous Systems Via A Block-Pulse Functions Scheme
5 pages
Thesis Linear Regression
100% (2)
Thesis Linear Regression
5 pages