ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
Introduction
Linear least-squares-Error (LSE) regression: The straight-line model
Linearization of nonlinear models
General linear LSE regression and the polynomial model
Polynomial regression with Matlab: polyfit
Non-linear LSE regression
Numerical solution of the non-linear LSE optimization problem:
Gradient search and Matlab’s fminsearch function
Solution of differential equations based on LSE minimization
Appendix: Explicit matrix formulation for the quadratic regression
problem
Introduction
In the previous lecture, polynomial and cubic spline interpolation methods were
introduced for estimating a value between a given set of precise data points. The
idea was to (interpolate) “fit” a function to the data points so as to perfectly pass
through all data points. Many engineering and scientific observations are made by
conducting experiments in which physical quantities are measured and recorded as
inexact (noisy) data points. In this case, the objective would be to find the best-fit
analytic curve (model) that approximates the underlying functional relationship
present in the data set. Here, the best-fit curve is not required to pass through the
data points, but it is required to capture the shape (general trend) of the data. This
curve fitting problem is referred to as regression. The following sections present
formulations for the regression problem and provide solutions.
The following figure compares two polynomials that attempt to fit the shown data
points. The blue curve is the solution to the interpolation problem. The green curve
is the solution (we seek) to the linear regression problem.
Linear Least-Squares-Error (LSE) Regression:
The Straight-Line Model
The regression problem will first be illustrated for fitting the linear model (straight-
line), 𝑦(𝑥) = 𝑎1 𝑥 + 𝑎0 , to a set of 𝑛 paired experimental observations:
(𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ). So, the idea here is to position the straight-line (i.e., to
determine the regression coefficients 𝑎0 and 𝑎1 ) so that some error measure of fit is
minimized. A common error measure is the sum-of-the-squares (SSE) of the
residual errors 𝑒𝑖 = 𝑦𝑖 − 𝑦(𝑥𝑖 ),
𝑛 𝑛 𝑛
The residual error 𝑒𝑖 is the discrepancy between the measured value, 𝑦𝑖 , and the
approximate value 𝑦(𝑥𝑖 ) = 𝑎0 + 𝑎1 𝑥𝑖 , predicted by the straight-line regression
model. The residual error for the 𝑖th data point is depicted in the following figure.
A solution can be obtained for the regression coefficients, {𝑎0 , 𝑎1 }, that minimizes
𝐸 (𝑎0 , 𝑎1 ). This criterion, 𝐸, which is called least-squares-error (LSE) criterion,
has a number of advantages, including that it yields a unique line for a given data
set. Differentiating 𝐸 (𝑎0 , 𝑎1 ) with respect to each of the unknown regression model
coefficients, and setting the result to zero lead to a system of two linear equations,
𝑛
𝜕
𝐸 (𝑎0 , 𝑎1 ) = 2 ∑(𝑦𝑖 − 𝑎1 𝑥𝑖 − 𝑎0 )(−1) = 0
𝜕𝑎0
𝑖=1
𝑛
𝜕
𝐸 (𝑎0 , 𝑎1 ) = 2 ∑(𝑦𝑖 − 𝑎1 𝑥𝑖 − 𝑎0 )(−𝑥𝑖 ) = 0
𝜕𝑎1
𝑖=1
− ∑ 𝑦𝑖 + ∑ 𝑎0 + ∑ 𝑎1 𝑥𝑖 = 0
𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛
− ∑ 𝑥𝑖 𝑦𝑖 + ∑ 𝑎0 𝑥𝑖 + ∑ 𝑎1 𝑥𝑖2 = 0
𝑖=1 𝑖=1 𝑖=1
Now, realizing that ∑𝑛𝑖=1 𝑎0 = 𝑛𝑎0 , and that multiplicative quantities that do not
depend on the summation index 𝑖 can be brought outside the summation (i.e.,
∑𝑛𝑖=1 𝑎𝑥𝑖 = 𝑎 ∑𝑛𝑖=1 𝑥𝑖 ), we may rewrite the above equations as
These are called the normal equations. We can solve for 𝑎1 using Cramer’s rule and
for 𝑎0 by substitution (Your turn: Perform the algebra) to arrive at the following
LSE solution:
Let the value of the sum-of-the-square of the difference between the 𝑦𝑖 values and
∑𝑛
𝑖=1 𝑦𝑖
their average value, 𝑦̅ = , be
𝑛
𝑛
𝐸𝑀 = ∑(𝑦𝑖 − 𝑦̅)2
𝑖=1
Then, the (positive) difference 𝐸𝑀 − 𝐸𝐿𝑆𝐸 represents the improvement (where the
smaller 𝐸𝐿𝑆𝐸 is, the better) due to describing the data in terms of a straight-line,
rather than as an average value (a straight-line with zero slope and 𝑦-intercept
equals to 𝑦̅). The coefficient of determination, 𝑟 2 , is defined as the relative error
between 𝐸𝑀 and 𝐸𝐿𝑆𝐸 ,
𝐸𝑀 − 𝐸𝐿𝑆𝐸 𝐸𝐿𝑆𝐸
𝑟2 = =1−
𝐸𝑀 𝐸𝑀
For perfect fit, where the regression line goes through all data points, 𝐸𝐿𝑆𝐸 = 0 and
𝑟 2 = 1, signifying that the line explains 100% of the variability in the data. On the
other hand for 𝐸𝑀 = 𝐸𝐿𝑆𝐸 we obtain 𝑟 2 = 0, and the fit represents no improvement
over a simple average. A value of 𝑟 2 between 0 and 1 represents the extent of
improvement. So, 𝑟 2 = 0.8 indicates that 80% of the original uncertainty has been
explained by the linear model. Using the above expressions for 𝐸𝐿𝑆𝐸 , 𝐸𝑀 , 𝑎0∗ and 𝑎1∗
one may derive the following formula for the correlation coefficient, 𝑟, (your turn:
Perform the algebra)
Solution. The following Matlab script computes the linear regression coefficients,
𝑎0∗ and 𝑎1∗ , for a straight-line employing the LSE solution.
x=[1 2 3 4 5 6 7];
y=[2.5 7 38 55 61 122 110];
n=length(x);
a1=(n*sum(x.*y)-sum(x)*sum(y))/(n*sum(x.^2)-(sum(x)).^2)
a0=sum(y)/n-a1*sum(x)/n
The solution is 𝑎1∗ = 20.5536 and 𝑎0∗ = −25.7143. The following plot displays the
data and the regression model, 𝑦(𝑥 ) = 20.5536𝑥 − 25.7143.
Example. Anscombe's quartet comprises four datasets that have 𝑟 2 ≅ 0.666, yet
appear very different when graphed. Each dataset consists of eleven (𝑥𝑖 , 𝑦𝑖 ) points.
They were constructed in 1973 by the statistician Francis Anscombe to demonstrate
both the importance of graphing data before analyzing it and the effect of outliers
on statistical properties. Notice that if we are to ignore the outlier point in the third
data set, then the regression line would be perfect, with 𝑟 2 = 1.
Your turn: Employ linear regression to generate the above plots and determine 𝑟 2
for each of the Anscombe’s data sets.
Linearization of Nonlinear Models
The straight-line regression model is not always suitable for curve fitting. The
choice of regression model is often guided by the plot of the available data, or can
be guided by the knowledge of the physical behavior of the system that generated
the data. In general, polynomial or other nonlinear models are more suitable. A
nonlinear regression technique (introduced later) is available to fit complicated
nonlinear equations to data. However, some basic nonlinear functions can be readily
transformed into linear functions in their regression coefficients (we will refer to
such functions as transformable or linearizable). Here, we can take advantage of
the LSE regression formulas, which we have just derived, to fit the transformed
equations to the data.
From the results for 𝑟 2 , the power model has the best fit. The following graph
compares the three models. By visually inspecting the plot we see that, indeed, the
power model (red; 𝑟 2 = 0.9477) is a better fit compared to the linear model (blue;
𝑟 2 = 0.8811) and to the exponential model (green; 𝑟 2 = 0.8141). Also, note that
the straight-line fits the data better than the exponential model.
Your turn: Repeat the above regression problem employing: (a) The logarithmic
1
function, 𝑦 = 𝛼 ln(𝑥 ) + 𝛽; (b) The reciprocal function, 𝑦 = ; (c) The
𝛼𝑥+𝛽
𝛼𝑥
saturation-growth-rate function, 𝑦(𝑥) = . Compare the results by plotting.
𝛽+𝑥
General Linear LSE Regression and the Polynomial Model
For some data sets, the underlying model cannot be captured accurately with a
straight-line, exponential, logarithmic or power models. A model with a higher
degree of nonlinearity (i.e., with added flexibility) is required. There are a number
of higher order functions that can be used as regression models. One important
regression model would be a polynomial. A general LSE formulation is presented
next. It extends the earlier linear regression analysis to a wider class of nonlinear
functions, including polynomials. (Note: when we say linear regression, we are
referring to a model that is linear in its regression parameters, 𝑎𝑖 , not 𝑥.)
Consider the general function in z,
𝑦 = 𝑎𝑚 𝑧𝑚 + 𝑎𝑚−1 𝑧𝑚−1 + ⋯ + 𝑎1 𝑧1 + 𝑎0 (1)
where the 𝑧𝑖 represents a basis function in 𝑥. It can be easily shown that if the basis
functions are chosen as 𝑧𝑖 = 𝑥 𝑖 , then the above model is that of an 𝑚-degree
polynomial,
𝑦 = 𝑎𝑚 𝑥 𝑚 + 𝑎𝑚−1 𝑥 𝑚−1 + ⋯ + 𝑎1 𝑥 + 𝑎0
There are many classes of functions that can be described by the above general
function in Eqn. (1). Examples include:
2
𝑦 = 𝑎0 + 𝑎1 𝑥, 𝑦 = 𝑎0 + 𝑎1 cos(𝑥 ) + 𝑎2 sin(2𝑥) and 𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑒 −𝑥
One example of a function that can’t be represented by the above general function
is the radial-basis-function (RBF)
2
𝑦 = 𝑎0 + 𝑎1 𝑒 𝑎2 (𝑥−𝑎3 )
In other words, this later function is not transformable into a linear regression
model, as was the case (say) for the exponential function, 𝑦 = 𝛼𝑒 𝛽𝑥 . The
regression with such non-transformable functions is known as nonlinear regression
and is considered later in this lecture.
In the following formulation of the LSE regression problem we restrict the model of
the regression function to the polynomial
𝑦 = 𝑎𝑚 𝑥 𝑚 + 𝑎𝑚−1 𝑥 𝑚−1 + ⋯ + 𝑎1 𝑥 + 𝑎0
𝑥1𝑚 𝑥1𝑚−1 ⋯ 𝑥1 1 𝑎𝑚 𝑦1
𝑥2𝑚 𝑥2𝑚−1 ⋯ 𝑥2 1 𝑎𝑚−1 𝑦2
⋮ ⋮ ⋱ ⋮ ⋮ ⋮ = ⋮
𝑚
𝑥𝑛−1 𝑚−1
𝑥𝑛−1 ⋯ 𝑥𝑛−1 1 𝑎1 𝑦𝑛−1
[ 𝑥𝑛𝑚 𝑥𝑛𝑚−1 ⋯ 𝑥𝑛 1] [ 𝑎0 ] [ 𝑦𝑛 ]
𝑚−𝑗+1 2
= ∑𝑛𝑖=1(𝑦𝑖 − ∑𝑚+1
𝑗=1 𝑥𝑖 𝑎𝑚−𝑗+1 )
where ||.|| denotes the vector norm. As we did earlier in deriving the straight-line
𝜕
regression coefficients 𝑎0 and 𝑎1 , we set all partial derivatives 𝐸(a) to zero and
𝜕𝑎𝑖
solve the resulting system of (𝑚 + 1) equations:
𝑛 𝑚+1
𝜕 𝑚−𝑗+1
𝐸(𝑎0 , 𝑎1 , … , 𝑎𝑚 ) = −2 ∑ (𝑦𝑖 − ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) 𝑥𝑖0 = 0
𝜕𝑎0
𝑖=1 𝑗=1
𝑛 𝑚+1
𝜕 𝑚−𝑗+1
𝐸(𝑎0 , 𝑎1 , … , 𝑎𝑚 ) = −2 ∑ (𝑦𝑖 − ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) 𝑥𝑖1 = 0
𝜕𝑎1
𝑖=1 𝑗=1
𝑛 𝑚+1
𝜕 𝑚−𝑗+1
𝐸(𝑎0 , 𝑎1 , … , 𝑎𝑚 ) = −2 ∑ (𝑦𝑖 − ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) 𝑥𝑖2 = 0
𝜕𝑎2
𝑖=1 𝑗=1
.
.
𝑛 𝑚+1
𝜕 𝑚−𝑗+1
𝐸(𝑎0 , 𝑎1 , … , 𝑎𝑚 ) = −2 ∑ (𝑦𝑖 − ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) 𝑥𝑖𝑚 = 0
𝜕𝑎𝑚
𝑖=1 𝑗=1
𝑛 𝑚+1
𝑚−𝑗+1
∑ (𝑥𝑖 𝑦𝑖 − 𝑥𝑖 ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) = 0
𝑖=1 𝑗=1
𝑛 𝑚+1
𝑚−𝑗+1
∑ (𝑥𝑖2 𝑦𝑖 − 𝑥𝑖2 ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) = 0
𝑖=1 𝑗=1
.
.
𝑛 𝑚+1
𝑚−𝑗+1
∑ (𝑥𝑖𝑚 𝑦𝑖 − 𝑥𝑖𝑚 ∑ 𝑥𝑖 𝑎𝑚−𝑘+1 ) = 0
𝑖=1 𝑗=1
or,
𝑛 𝑚+1 𝑛
𝑚−𝑗+1
∑ ( ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) = ∑ 𝑦𝑖
𝑖=1 𝑗=1 𝑖=1
𝑛 𝑚+1 𝑛
𝑚−𝑗+1
∑ 𝑥𝑖 ( ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) = ∑ 𝑥𝑖 𝑦𝑖
𝑖=1 𝑗=1 𝑖=1
𝑛 𝑚+1 𝑛
𝑚−𝑗+1
∑ 𝑥𝑖2 ( ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) = ∑ 𝑥𝑖2 𝑦𝑖
𝑖=1 𝑗=1 𝑖=1
.
.
𝑛 𝑚+1 𝑛
𝑚−𝑗+1
∑ 𝑥𝑖𝑚 ( ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) = ∑ 𝑥𝑖𝑚 𝑦𝑖
𝑖=1 𝑗=1 𝑖=1
𝑚−𝑗+1
which by setting 𝑧𝑖𝑗 = 𝑥𝑖 (𝑖 = 1, 2, … , 𝑛; 𝑗 = 1, 2, … , 𝑚 + 1) can then be
expressed in matrix form as (Your turn: derive it)
ZT(Za) = ZTy or (ZTZ)a= ZTy
(Refer to the Appendix for an explicit representation of the above equation for the
case of a quadratic regression polynomial.)
The matrix ZTZ is a (𝑚 + 1)x(𝑚 + 1) square matrix (recall that 𝑚 is the degree of
the polynomial model being used). Generally speaking, the inverse of ZTZ does
exist for the above regression formulation. Multiplying both sides of the equation
by (ZTZ)-1 leads to the LSE solution for the regression coefficient vector a,
Ia* = a* = [(ZTZ)-1 ZT]y
Where I is the identity matrix. Matlab offers two ways for solving the above system
of linear equations: (1) using the left-division operator a = (Z′*Z)\(Z′*y), where ‘′’ is
the Matlab transpose operator, or (2) using a = pinv(Z)*y, where pinv is the built-in
Matlab pseudo-inverse function that computes the matrix (ZTZ)-1 ZT.
The coefficient of determination, 𝑟 2 , for the above polynomial regression
formulation is given by (for 𝑛 ≫ 𝑚)
2
∑𝑛𝑖=1(𝑦𝑖 − 𝑦̂𝑖 )2
𝑟 =1− 𝑛
∑𝑖=1(𝑦𝑖 − 𝑦̅)2
where 𝑦̂𝑖 is the 𝑖th component of the prediction vector Za*, and 𝑦̅ is the mean of the
𝑦𝑖 values. Matlab can conveniently compute 𝑟 2 as,
1-sum((y-Z*a).^2)/sum((y-mean(y)).^2)
Example. Employ the polynomial LSE regression formulation to solve for a cubic
curve fit for the following data set. Also, compute 𝑟 2 .
𝑥 1 2 3 4 5 6 7 8
𝑦 2.5 7 38 55 61 122 83 145
Alternatively, we may use the left-division operator and obtain the same result:
The following is a snapshot of a session with the “Basic Fitting tool” (introduced in
the previous lecture) applied to data in the above example. It computes and
compares a cubic fit to a 5th-degree and a 6th-degree polynomial fit.
Your turn. Employ the linear LSE regression formulation to fit the following data
set employing the model: 𝑦(𝑥) = 𝑎0 + 𝑎1 cos(𝑥 ) + 𝑎2 sin(2𝑥). Also, determine 𝑟 2
and plot the data along with 𝑦(𝑥). Hint: First determine the 10x3 matrix Z required
for the Za = y formulation.
𝑥 1 2 3 4 5 6 7 8 9 10
𝑦 1.17 0.93 −0.71 −1.31 2.01 3.42 1.53 1.02 −0.08 −1.51
Polynomial Regression with Matlab: Polyfit
The Matlab polyfit function was introduced in the previous lecture for solving
polynomial interpolation problems (𝑚 + 1 = 𝑛, same number of equations as
unknowns). This function can also be used for solving 𝑚-degree polynomial
regression given 𝑛 data points (𝑚 + 1 < 𝑛, more equations than unknowns). The
syntax of polyfit call is p=polyfit(x,y,m), where x and y are the vectors of the
independent and dependent variables, respectively, and m is the order of the
regression polynomial. The function returns a row vector, p, that contains the
polynomial coefficients.
Example. Use polyfit to solve for the cubic regression model encountered in the
example from the previous section.
Solution:
Note that this solution is identical to the one obtained using the pseudo-inverse-
based solution.
Non-Linear LSE Regression
This set of two nonlinear equations need to be solved for the two coefficients, 𝑎0
and 𝑎1 . Numerical algorithms such as Newton’s iterative method for solving a set of
two nonlinear equations, or Matlab’s built-in fsolve and solve functions can be used
to solve this system of equations, as shown in the next example.
∑(𝑦𝑖 − 𝑎0 (1 − 𝑒 𝑎1 𝑥𝑖 )) (−1 + 𝑒 𝑎1 𝑥𝑖 ) = 0
𝑖=1
4
∑(𝑦𝑖 − 𝑎0 (1 − 𝑒 𝑎1 𝑥𝑖 )) (𝑎0 𝑥𝑖 𝑒 𝑎1 𝑥𝑖 ) = 0
𝑖=1
(1 − 𝑎0 (1 − 𝑒 −2𝑎1 ))(−2𝑎0 𝑒 −2𝑎1 ) + (−4 − 𝑎0 (1 − 𝑒 2𝑎1 ))(2𝑎0 𝑒 2𝑎1 ) + (−12 − 𝑎0 (1 − 𝑒 4𝑎1 ))(4𝑎0 𝑒 4𝑎1 ) = 0
Matlab returns a set of four solutions to the above minimization problem. The first
thing we notice is that for nonlinear regression, minimizing LSE may lead to multiple
solutions (multiple minima). The solutions for this particular problem are:
1. 𝑎0 = 𝑎1 = 0, which leads to: 𝑦 = 0(1 − 𝑒 0 ) = 0, or 𝑦 = 0 (the 𝑥-axis).
2. 𝑎0 = 0 and 𝑎1 ≅ −1.3610 + 1.5708𝑖, which leads to: 𝑦 = 0
3. 𝑎0 = 0 and 𝑎1 ≅ 0.1186 + 1.5708𝑖, which leads to: 𝑦 = 0.
4. 𝑎0 ≅ 2.4979 and 𝑎1 ≅ 0.4410, which leads to 𝑦(𝑥) = 2.4979(1 − 𝑒 0.441𝑥 ).
The solutions 𝑦(𝑥) = 0 and 𝑦(𝑥) = 2.4979(1 − 𝑒 0.441𝑥 ) are plotted below. It is
obvious that the optimal solution (in the LSE sense) is 𝑦(𝑥) = 2.4979(1 − 𝑒 0.441𝑥 ).
Your turn: Solve the above system of two nonlinear equations employing Matlab’s
fsolve.
Your turn: Fit the exponential model, 𝑦 = 𝛼𝑒 𝛽𝑥 , to the data in the following table
employing nonlinear least squares regression. Then, linearize the model and
determine the model coefficients by employing linear least squares regression (i.e.,
use the formulas derived in the first section or polyfit). Plot the solutions.
𝑥 0 1 2 3 4
𝑦 1.5 2.5 3.5 5.0 7.5
In the above example, we were lucky in the sense that the (symbolic-based) solve
function returned the optimal solution for the optimization problem at hand. In more
general non-linear LSE regression problems the models employed are complex and
normally have more than two unknown coefficients. Here, solving (symbolically) for
the partial derivatives of the error function becomes tedious and impractical.
Therefore, one would use numerically-based multi-variable optimization algorithms
to minimize 𝐸 (𝒂) = 𝐸(𝑎0 , 𝑎1 , 𝑎2 , … ), which are extensions of the ones considered in
Lectures 13 and 14.
A more proper way to select the initial search vector a = [𝑎0 𝑎1 ] for the above
optimization problem is to solve a set of 𝑘 nonlinear equations that is obtained from
forcing the model to go through 𝑘 points (selected randomly from the data set). Here,
𝑘 is the number of unknown model parameters. For example, for the above problem,
we solve the set of two nonlinear equations
𝑦𝑖 − 𝑎0 (1 − 𝑒 𝑎1 𝑥𝑖 ) = 0
𝑦𝑗 − 𝑎0 (1 − 𝑒 𝑎1 𝑥𝑗 ) = 0
where (𝑥𝑖 , 𝑦𝑖 ) and (𝑥𝑗 , 𝑦𝑗 ) are two distinct points selected randomly from the set of
points being fitted. A numerical nonlinear equation solver can be used, say Matlab’s
fsolve, as shown below [here, the end points (−2,1) and (4, −12) were selected].
Your turn: The height of a person at different ages is reported in the following table.
x (age) 0 5 8 12 16 18
y (in) 20 36.2 52 60 69.2 70
0.888
Ans. 𝑦(𝑥 ) =
√𝑥 4 −0.956𝑥 2 +0.819
As mentioned earlier, different initial conditions may lead to different local minima
of the nonlinear function being minimized. For example, consider the function of
two variables that exhibits multiple minima (refer to the plot):
𝑓(𝑥, 𝑦) = −0.02 sin(𝑥 + 4𝑦) − 0.2 cos(2𝑥 + 3𝑦) − 0.3 sin(2𝑥 − 𝑦) + 0.4cos(𝑥 − 2𝑦)
The following are the local minima discovered by function grad_optm2d for the
indicated initial conditions:
The same local minima are discovered by fminsearch when starting from the same
initial conditions:
Note that for this limited set of searches, the solution with the smallest SSE value is
(𝑥 ∗ , 𝑦 ∗ ) = (0.0441, −1.7618), which represents a more optimal solution.
Your turn (Email your solution to your instructor one day before Test 3).
Fit the following data
𝑥 1 2 3 4 5 6 7 8 9 10
𝑦 1.17 0.93 −0.71 −1.31 2.01 3.42 1.53 1.02 −0.08 −1.51
employing the model
𝑦(𝑥) = 𝑎0 + 𝑎1 cos(𝑥 + 𝑏1 ) + 𝑎2 cos(2𝑥 + 𝑏2 )
This problem can be solved employing nonlinear regression (think solution via
fminsearch), or it can be linearized which allows you to use linear regression (think
solution via pseudo-inverse). Hint: cos(𝑥 + 𝑏) = cos(𝑥 ) cos(𝑏) − sin(𝑥 ) sin(𝑏).
Plot the data points and 𝑦(𝑥) on the same graph.
1
𝑦̈ (𝑥 ) + 𝑦̇ (𝑥 ) + 9𝑥 2 𝑦(𝑥 ) = 0 with 𝑦(0) = 1 and 𝑦̇ (0) = 2
5
𝑑𝑦̃(0)
𝑦̃(0) = 𝑎0 = 𝑦(0) = 1 and = 𝑎1 = 𝑦̇ (0) = 2
𝑑𝑥
Now, we are left with the problem of estimating the remaining polynomial
coefficients 𝑎2 , 𝑎3 , 𝑎4 such that the residual
𝑑2 𝑦̃(𝑥 ) 1 𝑑𝑦̃(𝑥 )
𝑓(𝑥, 𝑎2 , 𝑎3 , 𝑎4 ) = 2
+ + 9𝑥 2 𝑦̃(𝑥 )
𝑑𝑥 5 𝑑𝑥
is as close to zero as possible for all 𝑥 ∈ [0 1]. We will choose to minimize the
integral of the squared residual,
1
𝐼 (𝑎2 , 𝑎3 , 𝑎4 ) = ∫ [𝑓(𝑥, 𝑎2 , 𝑎3 , 𝑎4 )]2 𝑑𝑥
0
𝑑𝑦̃(𝑥) 𝑑2 𝑦̃(𝑥)
First, we compute the derivatives, and ,
𝑑𝑥 𝑑𝑥 2
𝑑𝑦̃(𝑥 )
= 2 + 2𝑎2 𝑥 + 3𝑎3 𝑥 2 + 4𝑎4 𝑥 3
𝑑𝑥
𝑑 2 𝑦̃(𝑥 )
2
= 2𝑎2 + 6𝑎3 𝑥 + 12𝑎4 𝑥 2
𝑑𝑥
2 2
+ 9𝑥 2 + 18𝑥 3 + 𝑎2 (2 + 𝑥 + 9𝑥 4 ) +
𝑓 (𝑥, 𝑎2 , 𝑎3 , 𝑎4 ) =
5 5
3 4
𝑎3 (6𝑥 + 𝑥 2 + 9𝑥 5 ) + 𝑎4 (12𝑥 2 + 𝑥 3 + 9𝑥 6 )
5 5
The following Matlab session shows the results of using fminsearch to solve for the
coefficients 𝑎2 , 𝑎3 , 𝑎4 that minimize the error function
1
𝐼 (𝑎2 , 𝑎3 , 𝑎4 ) = ∫ [𝑓(𝑥, 𝑎2 , 𝑎3 , 𝑎4 )]2 𝑑𝑥
0
Note: the first component a(1) in the solution vector ’a’ is redundant; it is not used in function 𝐼.
Your turn: Consider the first-order, nonlinear, homogeneous differential equation with varying
coefficient 𝑦̇ (𝑥) + (2𝑥 − 1)𝑦 2 (𝑥) = 0, with 𝑦(0) = 1 and 𝑥 ∈ [0 1].
Employ the method of minimizing the squared residual to solve for the approximate solution
𝑦̃(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + 𝑎3 𝑥 3 + 𝑎4 𝑥 4 over the interval 𝑥 ∈ [0 1]. Plot 𝑦̃(𝑥) and the exact
solution 𝑦(𝑥) given by
1
𝑦(𝑥) =
𝑥2 − 𝑥 + 1
Your turn: Determine the parabola 𝑦(𝑥) = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐 that approximates the cubic 𝑔(𝑥) =
2𝑥 3 − 𝑥 2 + 𝑥 + 1 (over the interval 𝑥 ∈ [0 2]) in the LSE sense. In other words, determine the
coefficients 𝑎, 𝑏 and 𝑐 such that the following error function is minimized,
2
𝐸(𝑎, 𝑏, 𝑐) = ∫ [𝑔(𝑥) − 𝑦(𝑥)]2 𝑑𝑥
0
Solve the problem in two ways: (1) analytically; and (2) Employing fminsearch after evaluating
the integral. Plot 𝑔(𝑥) and 𝑦(𝑥) on the same set of axis.
Appendix: Explicit Matrix Formulation for the Quadratic Regression Problem
Earlier in this lecture we have derived the 𝑚-degree polynomial LSE regression
formulation as follows,
𝑛 𝑚+1 𝑛
𝑚−𝑗+1
∑ ( ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) = ∑ 𝑦𝑖
𝑖=1 𝑗=1 𝑖=1
𝑛 𝑚+1 𝑛
𝑚−𝑗+1
∑ 𝑥𝑖 ( ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) = ∑ 𝑥𝑖 𝑦𝑖
𝑖=1 𝑗=1 𝑖=1
𝑛 𝑚+1 𝑛
𝑚−𝑗+1
∑ 𝑥𝑖2 ( ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) = ∑ 𝑥𝑖2 𝑦𝑖
𝑖=1 𝑗=1 𝑖=1
.
.
𝑛 𝑚+1 𝑛
𝑚−𝑗+1
∑ 𝑥𝑖𝑚 ( ∑ 𝑥𝑖 𝑎𝑚−𝑗+1 ) = ∑ 𝑥𝑖𝑚 𝑦𝑖
𝑖=1 𝑗=1 𝑖=1
𝑛 3 𝑛
3−𝑗
∑ 𝑥𝑖 (∑ 𝑥𝑖 𝑎3−𝑗 ) = ∑ 𝑥𝑖 𝑦𝑖
𝑖=1 𝑗=1 𝑖=1
𝑛 3 𝑛
3−𝑗
∑ 𝑥𝑖2 (∑ 𝑥𝑖 𝑎3−𝑗 ) = ∑ 𝑥𝑖2 𝑦𝑖
𝑖=1 𝑗=1 𝑖=1
It can be shown that the above equations (Your Turn) can be cast in matrix form as,
𝑛 𝑛 𝑛
𝑛 ∑ 𝑥𝑖 ∑ 𝑥𝑖2 ∑ 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑎0 𝑛
This is a 3x3 linear system that can be solved using the methods of Lectures 15 & 16.
With this type of formulation, care must be taken as to employ numerical solution
methods that can handle ill-condition coefficient matrices; note the dominance of the
(all positive) coefficients in the last row of the matrix.
The following two-part video (part1, part2) derives the above result (directly) from
basic principles. Here is an example of quadratic regression: Part 1 Part 2.
Example. Employ the above formulation to fit a parabola to the following data.
𝑥 0 5 8 12 16 18
𝑦 20 36.2 52 60 69.2 70
Matlab solution:
The code that generated the above result is shown below.
Your turn: Verify the above solution employing Polyfit. Repeat employing the
pseudo-inverse solution a = pinv(Z)*y applied to the formulation,
𝑥1𝑚 𝑥1𝑚−1 ⋯ 𝑥1 1 𝑎𝑚 𝑦1
𝑥2𝑚 𝑥2𝑚−1 ⋯ 𝑥2 1 𝑎𝑚−1 𝑦2
𝐙𝐚 = ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ = ⋮
𝑚
𝑥𝑛−1 𝑚−1
𝑥𝑛−1 ⋯ 𝑥𝑛−1 1 𝑎1 𝑦𝑛−1
[ 𝑥𝑛𝑚 𝑥𝑛𝑚−1 ⋯ 𝑥𝑛 1] [ 𝑎0 ] [ 𝑦𝑛 ]
Your turn: Extend the explicit matrix formulation of this appendix to a cubic function
𝑓(𝑥 ) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + 𝑎3 𝑥 3 and use it to determine the polynomial coefficients
for the data set from the last example. Compare (by plotting) your solution to the
following solution that was obtained using nonlinear regression,
74.321
𝑦(𝑥) = .
1+2.823𝑒 −0.217𝑥
Ans. Your solution should look like the one depicted below.
Solution Strategies for LSE Regression Problem