Lecture 8 Linear Model for Regression
► Regression and linear models
► Batch methods
► Ordinary least squares (OLS)
Lecture 8 Linear Model for Regression ► Maximum likelihood estimates
► Sequential methods
► Least mean squares (LMS)
► Recursive (sequential) least squares (RLS)
2/27
Problem Setup
Given a set of N labeled examples, and
, the goal is to learn a mapping
What is regression analysis? which associates x with y, such that we can make prediction about
when a new input is provided.
► Parametric regression: Assume a functional form for f(x) (e.g.
linear models).
► Nonparametric regression: Do not assume functional form for
f(x).
In this lecture we focus on parametric regression.
3/27 4/27
Regression Regression Function: Conditional Mean
► Regression aims at modeling the dependence of a response Y on a We consider the mean squared error and find the MMSE estimate:
covariate X. In other words, the goal of regression is to predict the
value of one or more continuous target variables y given the value of
input vector x.
► The regression model is described by
► Terminology:
► x: input, independent variable, predictor, regressor, covariate
► y: output, dependent variable, response
► The dependence of a response on a covariate is captured via a
conditional probability distribution, .
► Depending on f(x),
► Linear regression with basis functions: .
► Linear regression with kernels:
5/27 6/27
Why Linear Models?
Linear Regression
► Built on well-developed linear transformation.
► Can be solved analytically.
7/27
► Yield some interpretability (in contrast to deep learning). 8/27
Linear Regression Polynomial Regression:
Linear regression refers to a model in which the conditional mean of y
given the value of x is an affine function of
where are known as basis functions and
By using nonlinear basis functions, we allow the function f(x) to be a
nonlinear function of the input vector x (but a linear function of ).
[Figure source: Bishop's PRML]
9/27 10/27
Basis Functions
► Polynomial regression:
► Gaussian basis functions:
Ordinary Least Squares
► Spline basis functions: Piecewise polynomials (divide the input
space up into regions and fit a different polynomial in each Loss function view
region).
► Many other possible basis functions: sigmoidal basis functions,
hyperbolic tangent basis functions, Fourier basis, wavelet basis,
and so on.
11/27 12/27
Least Squares Method
Given a set of training data , we determine the weight
vector which minimizes Find the estimate such that
where both y and Φ are given.
where and is known as the
design matrix How do you find the minimizer ?
Solve for w.
Note that
13/27 14/27
Note that
Therefore, leads to the normal equation that
is of the form
Thus, LS estimate of w is given by
Then, we have
where is known as the Moore-Penrose pseudo-inverse.
15/27 16/27
Maximum Likelihood
We consider a linear model where the target variable yn is assumed to
be generated by a deterministic function with
additive Gaussian noise:
Least Squares
for and
Probabilistic model view with MLE
In a compact form, we have
In other words, we model as
17/27 18/27
The log-likelihood is given by
Sequential Methods
MLE is given by
LMS and RLS
leading to
which we arrived at under Gaussian noise assumption.
19/27 20/27
Online Learning Mean Squared Error (MSE)
A method of machine learning in which data Interested in MMSE estimate:
becomes available in a sequential order and is used to
update our best predictor for future data at each step,
as opposed to batch learning techniques which Sample average:
generate the best predictor by learning on the entire
training data set at once. Instantaneous squared error:
[Source: Wikipedia]
21/27 22/27
Least Mean Squares (LMS) Recursive (Sequential) LS
Approximate .
LMS is a gradient-descent method which minimizes the instantaneous We introduce the forgetting factor λ to de-emphasize
squared error old samples, leading to the following error function
The gradient descent method leads to the updating rule for w that is of where
the form Solving for wn leads to
where η > 0 is learning rate.
[Widrow and Hoff, 1960]
23/27 24/27
We define
The recursion for Pn is given by
With these definitions, we have
The core idea of RLS is to apply the matrix inversion lemma
to develop
the sequential algorithm without matrix inversion.
25/27 26/27
Thus, the updating rule for w is given by
27/27