Linear and nonlinear regression techniques using Excel Solver tool
Spreadsheets by S.K. Dentel 10/06.
These worksheets describe the use of regression for data analysis.
It is assumed that you already know how to use the Excel Regression tool.
Here we will cover:
1) Linear regression using Excel Solver
2) Nonlinear regression using Excel Solver
We will not worry with derivations at this point.
Why are we doing linear regression by two different methods? This is for two reasons:
First, I want you to learn how to use the Solver tool because it is useful for lots of situations when you
cannot get an analytical solution for an unknown.
Second, if we first use the Solver to duplicate another method, we can be sure that it works
before applying it to more difficult problems like nonlinear regression.
So the data below (in yellow) are what we want to analyze. First we graph them to check for outliers or other anomalies.
The graph is shown below, and it has some noise, but nothing that's totally unreasonable.
Now we want to compare these data to models, which means equations that we think will fit the data. We would like to see
which model equation fits the data best. This equation can then be used to predict or upscale the adsorption process,
from the 1-liter experiment to the pond, and to any carbon dose.
Here are three possibilities:
1. Linear equation 2. Langmuir equation 3. Freundlich equation
KLc 1/n
q = mc + b q = qmax q = KFc
1 + KLc
C0-C = X 70.0 Chart Title
Carbon dose C X/M = q
q (mg/g)
mg/L mg/L mg/L mg/g 60.0
0 (control) 0.110 50.0
1 0.059 0.051 51.0
40.0
2.5 0.015 0.095 38.0
5 0.01 0.100 20.0 30.0
10 0.008 0.102 10.2 20.0
25 0.001 0.109 4.36
10.0
100 0.0006 0.109 1.09
0 0.00 0.0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
C (mg/L)
rs or other anomalies.
ata. We would like to see
adsorption process,
1. Now let's try fitting the adsorption data to the linear model, using Solver.
First, we'll do a linear regression on q vs. C. I know, the data don't look linear, but this is to show you
how to do a regression using the SOLVE function. Since you already know how to use Excel's
linear regression capability (if not, go to the last worksheet), you'll be able to check your results.
So our function is q = mC+b. We want the best values for m and b. Here are the steps:
1. Put estimated values for m and b in a couple of cells (shown here in orange).
We'll use these in the above equation for our estimated q values.
2. Put the estimated q's in a new column. We'll call this "q-hat".
3. We'll now create a number that indicates how good our estimated q values are. In another column, compute
q [the measured value] minus q-hat, and SQUARE this. Sum these up : it's the "regression sum of squares" or RSS.
4. Note that if all of our estimates were exactly equal to the data, this RSS would be zero. The better
the fit, the lower RSS is. So if we can readjust m and b to get a better fit, the RSS will go down. The best
values of m and b will minimize the RSS. There is a tool that does this automatically, called SOLVER.
If it's not under the TOOLS menu, you can get it by going to Tools/Add-Ins and adding it.
5. In Solver, set the target cell as the cell where the RSS is calculated. Have it find the MIN value,*
by changing the cells that have the values for m and b. When you hit SOLVE, m and b will be changed
iteratively until the RSS is minimized. The linear regression is done!
Values:
m=
b=
Carbon dose C C 0
-C = X X/M = q q-hat (q-q-hat)^2 Do this yourself by filling in the
mg/L mg/L mg/L mg/g mg/g values: equations in the cells to the left,
0 (control) 0.110 (exclude) (exclude) then use SOLVE to minimize RSS.
1 0.059 0.051 51.0
2.5 0.015 0.095 38.0 When done, go to the next sheet.
5 0.01 0.1 20.0
10 0.008 0.102 10.2
25 0.001 0.109 4.36 *Note that when you use "minimum" rather t
100 0.0006 0.1094 1.09 "equals," you are not really using Solver to S
0 0.00 only to get the lowest value possible. It's sti
RSS = Sum: a trial-and-error approach, done by numerica
iteration.
her column, compute
ion sum of squares" or RSS.
down. The best
ed SOLVER.
ll be changed
urself by filling in the
in the cells to the left,
SOLVE to minimize RSS.
ne, go to the next sheet.
when you use "minimum" rather than
you are not really using Solver to Solve,
t the lowest value possible. It's still
-error approach, done by numerical
Example use of linear and nonlinear regression techniques to find best-fit model
Below is what you should have obtained.
Is this the right result? We can check it using Excel's built-in linear regression capability.
Choose Data/Data Analysis/Regression, choose the q values for the Y-range, 70
the C values for the X-range. If you did it right, the m and b values will be the same.
q (mg/g)
It is always a good policy to GRAPH the model line against the data. This is done to the right.--> 60
50
40
30
20
10
Of course, the built-in linear regression also gives an r value and lots of other information. We can calculate all of that
0
0 0.01 0.02 0.03 0
Values:
m = 839.30751
b = 6.5849737 Exercise: add the equations to compute
Carbon dose C C0-C = X X/M = q q-hat (q-q-hat)^2 (q-q-bar)^2 the residual sums of squares between q
mg/L mg/L mg/L mg/g mg/g values: values: and then the sum of these, which is the
0 (control) 0.110
1 0.059 0.051 51.0 56.1041168 26.052009
2.5 0.015 0.095 38.0 19.1745864 354.3962
5 0.01 0.1 20.0 14.9780488 25.219994 After you have this column
10 0.008 0.102 10.2 13.2994338 9.6064898 and its sum, compute r
25 0.001 0.109 4.36 7.42428121 9.3898193 below. See if it agrees with
100 0.0006 0.1094 1.09 7.08855821 35.934728 the value in "Summary Output" below.
0 0.00 6.5849737 43.361879
q-bar: 503.96112
Sum=RSS Sum=TSS r = 1-RSS/TSS=
SUMMARY OUTPUT If it does NOT agree, check your cell entries for
correctness. The cell entry for q-bar, the right-
Regression Statistics most column and its sum for TSS, and the formula
Multiple R 0.886283 for r. should all be the same as on the next sheet.
R Square 0.785498
Adjusted R S 0.742597 When done, go to the next sheet.
Standard Erro 10.03953
Observations 7
ANOVA
df SS MS F Significance F
Regression 1 1845.483 1845.483 18.3097707 0.0078711
Residual 5 503.9611 100.7922
Total 6 2349.444
Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%Lower 95.0%
Upper 95.0%
Intercept 6.584967 4.612779 1.427549 0.21277735 -5.27254 18.44247 -5.27254 18.44247
X Variable 1 839.308 196.1462 4.278992 0.00787106 335.09896 1343.517 335.099 1343.517
70
q (mg/g)
60
50
40
30
20
10
0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
C (mg/L)
: add the equations to compute q-bar,
al sums of squares between q and q-bar,
the sum of these, which is the TSS.
have this column
m, compute r
ee if it agrees with
in "Summary Output" below.
T agree, check your cell entries for
The cell entry for q-bar, the right-
n and its sum for TSS, and the formula
d all be the same as on the next sheet.
ne, go to the next sheet.
Upper 95.0%
Example use of linear and nonlinear regression techniques to find best-fit model
Why did we go to the trouble of using SOLVE to do a linear regression, when Excel has a
built-in capability to do this?
Linear regressions are easy, because there are explicit solutions for m, b, and r. For other cases,
it's not so straightforward. But now that you see how a linear regression works, you can use the same approach-
-and SOLVER--for ANY equation.. If the equation is anything other than the equation of a line, it's known as
NON-linear regression. The method even works for complex models that use more than one
equation. KLc
So let's try try a Langmuir isotherm, q = qmax
1 + KLc
This is easy. Change the labels for the two fitting parameters to "q max" and "KL."
Rewrite the equation for "q-hat" to use these parameters as in the Langmuir equation.
Use "SOLVE" exactly as before, minimizing the RSS by adjusting q max and KL.
You'll get the best values for these two parameters, and the r value.
You can also solve by maximizing the cell with the r value - you'll get the same result.
Once again: ALWAYS plot the data and the model on the same graph for visual examination!
(A smoother plot is constructed further down on this sheet) Values: 70
q (mg/g)
qmax= 73.26604 60
KL= 42.60169 50
Carbon dose C C0-C = X X/M = q q-hat (q-q-hat)^2 (q-q-bar)^2 40
mg/L mg/L mg/L mg/g mg/g values: values:
0 (control) 0.110 30
1 0.059 0.051 51.0 52.41332 1.997466 1101.728 20
2.5 0.015 0.095 38.0 28.56506 89.0181 407.7284
10
5 0.01 0.1 20.0 21.88794 3.564314 4.806117
10 0.008 0.102 10.2 18.62306 70.94802 57.87732 0
25 0.001 0.109 4.36 2.99372 1.866722 180.841 0 0.01 0.02 0.03 0.04 0.0
C (mg
100 0.0006 0.1094 1.09 1.826078 0.535938 279.3482
0 0.00 0 0 317.1147
q-bar: 17.8 167.9306 2349.444
Sum=RSS Sum=TSS r = 1-RSS/TSS=
For a smoother model curve, create a column with many more values for the x-axis,
and compute the y-value for each of them:
C q-hat
0 0
0.002 5.752391 70
q (mg/g)
0.004 10.66726
60
0.006 14.91509
0.008 18.62306 50
0.01 21.88794 40
0.012 24.78466
30
0.014 27.37218
0.016 29.6975 20
0.018 31.79854 10
0.02 33.70627
0
0.022 35.44618
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0.024 37.03949 C (mg/L)
20
10
0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
C (mg/L)
0.026 38.50398
0.028 39.85467
0.03 41.10432
0.032 42.26386
0.034 43.34269
0.036 44.34897
0.038 45.28978
0.04 46.17129
0.042 46.99895
0.044 47.77754
0.046 48.51131
0.048 49.204
0.05 49.85899
0.052 50.47927
0.054 51.06752
0.056 51.62616
0.058 52.15737
0.06 52.66313
0.062 53.14522
0.064 53.60527
0.066 54.04474
0.068 54.465
0.07 54.86728
0.072 55.2527
0.074 55.62231
0.076 55.97705
0.078 56.31781
0.08 56.6454
0.082 56.96056
0.084 57.264
0.086 57.55634
0.088 57.8382
0.09 58.11012
0.092 58.37262
0.094 58.62618
0.096 58.87126
0.098 59.10826
0.1 59.33758
e the same approach-
ne, it's known as
70
q (mg/g)
60
50
40
30
20
10
0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
C (mg/L)
0.928523
the x-axis,
4 0.05 0.06 0.07 0.08 0.09 0.1
(mg/L)
4 0.05 0.06 0.07 0.08 0.09 0.1
(mg/L)
Example use of linear and nonlinear regression techniques to find best-fit model
So that's it. In general, your steps for optimizing the fit of ANY equation or model
to data goes like this:
1. Tabulate and graph the data.
2. Choose the equation you want to use, and place estimated values for the equation parameters in cells.
3. Add a column to your table with computed y values ("y-hat") using the x's, the equation, and the fitting parameters.
4. Add a column with computed squares of residuals (y - y-hat)^2, and sum these to get the residual sum of squares, RSS.
5. Add a cell to compute the average y (y-bar), and use it in a new column to compute (y - y-bar)^2.
6. Sum these to get the total sum of squares, TSS, and use it with the RSS to compute r.
7. Use SOLVE to find the parameter values that maximize r or minimize the RSS. This is why it's called a LEAST SQUARES
8. You're not done until you PLOT the model equation along with the data for visual comparison.
If you do the above steps for more than one model equation (such as a comparison of the Langmuir and
Freundlich equations), the model with the highest r is the best fit. (This is NOT the case if using the
more conventional method of plotting linearized equations such as reciprocal or log-log plots
to get the fitting parameters, because this method gives r values for the linearized plot, not the original
plot, so they're not comparisons on the same basis.)
Below is a fit for the Freundlich isotherm, and a plot showing it and the Langmuir.
The Langmuir is a better fit.
1/n= 0.535729
70
q (mg/g)
KF= 243.3178
60
Carbon dose C C0-C = X X/M = q q-hat (q-q-hat)^2 (q-q-bar)^2
50
mg/L mg/L mg/L mg/g mg/g values: values:
0 (control) 0.110 40
1 0.059 0.051 51.0 53.417428052 5.843958 1101.728 30
2.5 0.015 0.095 38.0 25.647917053 152.574 407.7284
20
5 0.01 0.1 20.0 20.640243689 0.409912 4.806117
10 0.008 0.102 10.2 18.314592872 65.84662 57.87732 10
25 0.001 0.109 4.36 6.0115343475 2.727566 180.841 0
100 0.0006 0.1094 1.09 4.5722968212 12.09855 279.3482 0 0.01 0.02 0.03 0
0 0.00 0 0 317.1147
q-bar: 17.8 239.5006 2349.444
Sum=RSS Sum=TSS r = 1-RSS/TSS=
C q-hat Smoother model curve:
0 0
q (mg/g)
0.002 8.7147709726 70
0.004 12.63358549
60
0.006 15.69870771
0.008 18.314592872 50
0.01 20.640243689 40
0.012 22.758023884 Data
30
0.014 24.717236793 Langmuir
0.016 26.55020717 20 Freundlich
0.018 28.279506659 10
0.02 29.921644987
0
0.022 31.489136234
0 0.02 0.04 0.06 0.08 0.1
C (mg/L)
Data
30
Langmuir
20 Freundlich
10
0
0 0.02 0.04 0.06 0.08 0.1
0.024 32.991737961 C (mg/L)
0.026 34.437235423
0.028 35.831959908
0.03 37.181143802
0.032 38.489171215
0.034 39.759759468
0.036 40.996093428
0.038 42.200926875
0.04 43.376660285
0.042 44.525401416
0.044 45.649013138
0.046 46.749151637
0.048 47.827297275
0.05 48.884779747
0.052 49.922798789
0.054 50.942441358
0.056 51.944695988
0.058 52.930464878
0.06 53.90057413
0.062 54.855782476
0.064 55.796788752
0.066 56.724238344
0.068 57.63872876
0.07 58.540814478
0.072 59.431011178
0.074 60.309799456
0.076 61.177628089
0.078 62.034916917
0.08 62.882059403
0.082 63.719424901
0.084 64.547360686
0.086 65.366193763
0.088 66.176232493
0.09 66.977768055
0.092 67.771075758
0.094 68.556416236
0.096 69.334036521
0.098 70.104171021
0.1 70.867042408
he fitting parameters.
sidual sum of squares, RSS.
t's called a LEAST SQUARES method).
70
q (mg/g)
60
50
40
30
20
10
0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
C (mg/L)
0.898061
Data
Langmuir
Freundlich
4 0.06 0.08 0.1
(mg/L)
Data
Langmuir
Freundlich
4 0.06 0.08 0.1
(mg/L)
n
The "Langmuir-Freundlich" isotherm has the form ( K Lc )
q = qmax
1 + ( K L c )n
For the data, evaluate this model.