M02 Linear Regression Methods
M02 Linear Regression Methods
Polynomial features
Regularization to handle overfitting
Linear regression with one input
variable (simple regression)
We want to predict the house price value (in $10k) as a linear
function of its living area in 1k sq. ft) in King County:
• Here we have 100 data points as
plotted in the figure. We want to learn
a linear function from this dataset,
mapping from “area” to “price”
hypothesis representation?
• So, what is the optimal linear function
(a line drawn in the figure) that best
fits the given data loss function
definition?
• How do we find the optimal
hypothesis gradient descent or
normal equation
Example: House price as a linear
function of house living area
Initialize = choose
Then loop till convergence/termination
Compute =
Here and are computed using EQ. 1 and EQ.
2
Update :
Note that this gradient descent algorithm can be
easily generalized to do gradient descent for
multi-variate linear regression
Intuition about gradient descent
Here we have two parameters
The gradient descent direction
at point (2, 2) is indicated by the
𝐽(𝜃)
Learning rate too low makes the step size very small, and
convergence takes too long
Effect of learning rate choices – too big
value of leads to divergence
X= Y= =
Each row of matrix X represents a data point.
=
Multivariate linear regression
=
Normal equation
In matrix format:
= setting the eq. to 0: