An Adaptive Nonlinear Least-Squares Algorithm
An Adaptive Nonlinear Least-Squares Algorithm
Algorithm
JOHN E. DENNIS, JR.
Rice University
and
DAVID M. GAY and ROY E. WELSCH
Massachusetts Institute of Technology
NL2SOL is a modular program for solving nonhnear least-squares problems that incorporates a
number of novel features. It maintains a secant approximation S to the second-order part of the least-
squares Hessian and adaptively decides when to use this approximation. S is "sized" before updating,
something that is similar to Oren-Luenberger scaling. The step choice algorithm is based on
minimizing a local quadratic model of the sum of squares ftmctmn constrained to an elhptmal trust
regmn centered at the current approximate minimizer This is accomphshed using ideas chscussed by
Mor6, together with a special module for assessing the quahty of the step thus computed. These and
other ideas behind NL2SOL are discussed, and its evolution and current implementation are also
described briefly.
Key Words and Phrases" unconstrained optimization, nonlinear least squares, nonlinear regression,
quasi-Newton methods, secant methods
CR Categories: 5 14, 5.5
The A.lgonthm: NL2SOL: An Adaptive Nonlinear Least-Squares Algorithm. A C M Trans. Math.
Softw. 7, 3(Sept 1981), 348-368
1. INTRODUCTION
T h i s p r o j e c t b e g a n in o r d e r t o m e e t a n e e d for a n o n l i n e a r l e a s t - s q u a r e s a l g o r i t h m
w h i c h , in t h e l a r g e r e s i d u a l case, w o u l d b e m o r e r e l i a b l e t h a n t h e G a u s s - N e w t o n
o r L e v e n b e r g - M a r q u a r d t m e t h o d [15] a n d m o r e e f f i c i e n t t h a n t h e s e c a n t or
v a r i a b l e m e t r i c a l g o r i t h m s [17], s u c h as t h e D a v i d o n - F l e t c h e r - P o w e l l m e t h o d ,
w h i c h a r e i n t e n d e d for g e n e r a l f u n c t i o n m i n i m i z a t i o n .
We have developed a satisfactory computer program called NL2SOL based on
i d e a s in [18], a n d o u r p r i m a r y p u r p o s e h e r e is t o r e p o r t t h e d e t a i l s a n d t o g i v e
Permission to copy without fee all or part of this material is granted provided that the cop, es are not
made or distributed for drrect commercial advantage, the ACM copyright notice and the title of the
pubhcatlon and its date appear, and notice ]s given that copying Is by permission of the Association
for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific
permission.
This work was supported m part by National Science Foundation Grants DCR75-10143, MCS76-
00324, and SOC76-14311 to the National Bureau of Economm Research, Inc, and MCS79-06671 to
the Massachusetts Institute of Technology, and was sponsored m part by NSF Grant MCS78-09525
and Umted States Army Contract DAAG29-75-C-0024 to the Mathematics Research Center at the
University of Wisconsin-Madison.
Authors' addresses: J E Dennis, Department of Mathematmal Sciences, Rme Umverslty, P.O. Box
1892, Houston, TX 77001; D.M. Gay, M.I.T./CCREMS, Room E38-278, Cambridge, MA 02139; R.E.
Welsch, M.I.T./CCREMS, Room E53-383, Cambridge, MA 02139.
© 1981 ACM 0098-3500/81/0900-0348 $00.75
ACM Transactions on Mathematmal Software, Vol. 7, No 3, September 1981, Pages 348-368
An Adaptwe Nonlinear Least-Squares Algorithm • 349
some test results. On the other hand, we learned so much during the development
that seems likely to be applicable in the development of other algorithms that we
have chosen to expand our exposition to include some of this experience.
In Section 2 we set out the problem and the notation we intend to use. Section
3 deals with our way of supplementing the classical Gauss-Newton approximation
to the least-squares Hessian by various analogs of the Davidon-Fletcher-Powell
method. Section 4 briefly describes our interpretation of the Oren-Luenburger
[33] sizing strategy for this augmentation. In Section 5 we describe our adaptive
quadratic modeling of the objective function. Section 6 contains a discussion of
the stopping criteria and covariance matrices. Section 7 contains test results, and
Section 8 discusses the size of NL2SOL and the time it takes for housekeeping.
The NL2SOL Usage Summary is included in the accompanying algorithm.
Vf(x) = J ( x ) T R ( x ) (2.2)
Since {2.4) is the system of normal equations for the linear least-squares
problem
min(J(xk)s + R(xk))w(J(xDs + R(xk)), (2.5)
s
it is b e t t e r to obtain Sk from a QR decomposition of J ( x k ) (see [27]).
We can view {2.5) as defining a quadratic model in x = xk + s of the least-
squares criterion function (2.1):
qG(x) = ½ R ( x D T R ( x D + ( x - xk)TJ(xDTR(xD (2.6)
+ ½(X -- X k ) T J ( x k ) T J ( x k ) ( X -- Xk).
F r o m (2.1)-(2.3) we see t h a t the difference between this G a u s s - N e w t o n model
and the usual N e w t o n model obtained from a quadratic T a y l o r expansion around
xk is just the t e r m ½(x - xk)T[Er,(Xk)V2r,(Xk)](X -- Xk).
T h e conceptual difference between these two models is interesting in t h a t it
exposes some reasons for the deficiencies of the G a u s s - N e w t o n algorithm. T h e
N e w t o n model is based on the assumption t h a t f can be adequately modeled by
a quadratic, while the G a u s s - N e w t o n model {2.6) is shown by (2.5) to result from
the stronger assumption t h a t R can be a d e q u a t e l y modeled by an affine function.
Jk+l and R~+I, into Sk+l. T h e standard way to do this is to ask the second-order
approximant to transform the current x-change into the observed first-order
change, t h a t is,
S k + l h x k = Er,(xk+l)V~r, (Xk+1)hXk
= Jk+aRk+~
T -- JWRk+l =: Yk.
The crux of the problem can be seen by observing that even if Rk+l happened
to be zero and even if yk defined by (3.2) were used to make the update to Sk,
then Sk÷IAxk = yk = O, but Sk+l would be the same as Sk on the orthogonal
complement of {A xk, V}.
We use a straightforward modification of the 0ren-Luonburger self-scaling
technique [33]. The idea is to update rkSk rather than Sk to get Sk+l. The scalar
vk is chosen to try to shift the spectrum of Sk in hopes that the spectrum of vkSk
will overlap that of the second-order term we are approximating. We could take
the scalar to be
AxTy k _ [AxkSkhxkl -1
'a'x~Skaxk L ax~axa ][ ~ ]
We prefer to call this sizing, and since we are primarily concerned with Sa being
too large, we actually take
• a = min{[ ZhxTya [/[ hxTSaAxa [, 1}. (4.1)
Whatever this strategy is called, notice that when Ra+~ = 0, our ya = 0, and so
ra = 0 and Sa+l = 0. The use of sizing factor (4.1) made a significant difference in
the performance of the algorithm. (See Table IV.)
F i g u r e 1.
trust region may have to be shrunk repeatedly with attendant evaluations of the
residual function R to obtain an acceptable x~. Much more serious is the possibility
of overflow. The initial step bound b0, that is, the maximum length allowed for
the very first step attempted, is a parameter in NL2SOL; so the initial assumption
of global linearity can be overruled by making b0 small.
Figure 1 will perhaps be helpful at this point. The ellipses represent the
contours of qk and the circle is the trust region--our picture assumes the diagonal
scaling matrix Dk to be the identity and the Hessian approximation to be positive
definite. The point Nk is the "Newton point" or global minimizer of the convex
quadratic model qk, and the curve s(r) represents the locus of minimizers of qk(xk
+ S) constrained by IIs II2 -< r, 0 < r < oo. Complete details, based largely on [31],
can be found in [24], including the case where/ark is not positive definite, but we
choose hxk = s(r) so that IIDkhxk II 2 lies between 0.9 and 1.1 of the current trust
radius.
Since we were using this adaptive approach, it is not surprising that we also
thought of using the new information at xk+~ to choose between qS+l and q~+~ for
use in determining xk+2. We begin by default with S -- 0 and hence with the
Gauss-Newton model. Before giving our decision rules for step choice anal model
switching, we give some informal remarks that will probably be sufficient e:~pla-
nation for everyone except the specialist reader.
ACM Transactions on Mathematical Software, Vol, 7, No. 3, September 1981.
354 • J.E. Denn,s, Jr., D. M. Gay, and R. E Welsch
These other tests rely heavily on qk, the current quadratic model, which seems
very untrustworthy if (6.2) fails to hold. We do not worry if the latest step ~xk
actually increases the computed function value, since this may happen becaus~
of roundoff error. But we do return whichever of xk and xk + hxk gives the
smallest computed fuhction value.
Both the X-convergence and false-convergence tests employ the scale matrix
Dk - diag(d~. . . . , d~) mentioned in Section 5 to compute a scaled relative
difference, RELDX(x, y, D), between two vectors x, y ~ ~P. This could be defined
in any of several ways. For simplicity, we have chosen the definition
max,( [ d,(x, - y,)])
RELDX(x, y, D ) : = m a x j ( d j ( [ x j [ + [y~[))' {6.3)
C = c u r r e n t residual' C 2 = 2f(xD.
B = o p t i m a l r e s i d u a l accorchng to t h e G a u s s - N e w t o n model, for
w h m h Hk -- J(x~)Wj(xk)" B 2 = 2qk(xk -- H ~ ~ ~7f(xD).
A = p r o j e c t m n of t h e c u r r e n t residual onto the c o l u m n space of
J ( x D , the c u r r e n t Jacobian. A 2 = C -~- B ~.
ck = c o s v~ = A / C " c~ = left-hand side of (6.5)
A
Fag. 2. ck for t h e G a u s s - N e w t o n m o d e l
for a specified tolerance ev that should generally be less than cx. This may mean
that the convergence tolerances in {6.1) and {6.4)-{6.6) are too small for the
accuracy to which f a n d J are being computed, that there is an error in computing
J, or that f or Vf is discontinuous near xk.
Earlier versions of NL2SOL included a stopping test called the COSMAX test
that measured the cosines of the angles between the columns of the current
Jacobian matrix and the corresponding residual vector. We would have preferred
to examine Ck, the cosine of the angle between the residual vector and its
orthogonal projection onto the column space of the Jacobian matrix, but this
cosine would be expensive to compute for the augmented model. By contrast, Ck
is readily available for the Gauss-Newton model, since it is then the square root
of the left-hand side of (6.5); see Figure 2. For the Gauss-Newton model, (6.5)
thus amounts to a test that we would have preferred to the COSMAX test, and
for the augmented model it is a natural generalization of this preferred test.
Several people have suggested tests based on ck, including Allen [1] and Bates
and Watts [4]. (See also Belsley's weighted gradient stopping test [6].)
Test (6.5) can also be motivated by statistical considerations. Since there is
inherent variability in the data, it is generally not useful to continue iterating
when a candidate step Axk = (Ax~. . . . , Axe) is generated for which
max([ hx~ I/s.e.(x~): 1 _< i < _ p } (6.8)
is sufficiently small. Here s.e.(x~) denotes some estimate of the standard error
(square root of the variance) of the ith component of the current parameter
vector estimate xk and so is a function of the statistical variability in the data.
An alternative to {6.8) suggested by Pratt [37] is to consider general linear
combinations l W A x k of the components of hxk, that is,
max([ 1WAxk [ / ( 1 W V k l ) : 1~ 0) = (AXkw V~-1 AXD -1/2 , (6.9)
where Vk is a current estimate of the covariance matrix. For s.e.(xf) =
(eWVke,) 1/2, where e, is the ith standard unit vector, (6.9) clearly dominates (6.8),
so it seems reasonable to base a test on {6.9}. If we choose Vk = &HE 1, where
is the current residual sum of squares divided by max(l, n - p ) , that is,
~k = 2 f ( x D / m a x { 1 , n - p ) , (6.10)
and if Axk is a full Newton step, that is, hxk = - H [ 1 V f ( x D , then (6.9) equals
max(l, n - p } times the square root of the left-hand side of (6.5).
Many statistical inference procedures require an estimate of the covariance
matrix at the solution x*. NL2SOL provides three possibilities:
52H-1jTjH-1 (6.11)
ACM Transacuons on Mathematmal Software, Vol 7, No 3~ September 1981.
358 J.E. Denms, Jr., D M. Gay, and R. E. Welsch
$2H-1 (6.12)
5 2 ( j T j ) -' (6.13)
7. T E S T R E S U L T S
We have run N L 2 S O L on a n u m b e r of the test problems r e p o r t e d in the literature.
In particular, we have run it on the test problems listed in [26] and on one
described in [30]. T h e original sources for these problems, together with the
abbreviated problem names used in Tables I I - I V and some notes, are given in
T a b l e I.
T h e behavior of N L 2 S O L is d e t e r m i n e d in part b y an integer array IV and a
floating-point a r r a y V, which contain iteration and function evaluation limits,
convergence tolerances, and o t h e r switches and constants. In the runs summarized
in Tables II-IV, most of the IV and V input c o m p o n e n t s (other t h a n the iteration
and function evaluation limits) had the default values given t h e m by subroutine
D F A U L T . In particular, the initial step bound (trust radius), b0 = V(LMAX0),
had the value 100, and the convergence tolerances CA, ex, eR, eF t h a t appear in
{6.1) and {6.4)-(6.7) had the following values: ~A = V(AFCTOL) = 10-2°; ex =
V ( X C T O L ) - 1.49 × 10-s; eR = V ( R F C T O L ) = 10-1°; and er = V ( X F T O L ) - 2.22
× 10 -14. T h e values just mentioned are the defaults for the double-precision
version of N L 2 S O L on IBM 360 and 370 computers: we obtained Tables II-IV on
the I B M 370/168 at the Massachusetts Institute of Technology; the double-
precision arithmetic on this machine has a unit r o u n d o f f of 16-13 = 2.22 × 10 -16.
{Except as noted below and except for the runs stopped by the iteration or
function evaluation limits, all runs reported in Tables III and IV found essentially
the same function value listed in T a b l e II.)
T a b l e II summarizes the performance of N L 2 S O L on the test problem set when
all IV and V input c o m p o n e n t s (except the iteration and function evaluation
limits) have their default values. Following a suggestion of J. J. Mor6 [private
communication], we obtained new starting guesses for m a n y of the test problems
by multiplying the standard starting guess by 10 and 100. T h e column labeled LS
gives the base 10 logarithm of the factor b y which the standard starting guess
was multiplied. T h e problem dimensions appear in the columns h e a d e d N and P,
while the n u m b e r of function (i.e., R ( x ) ) and gradient (i.e., J ( x ) ) evaluations
performed, respectively, appear under N F and NG. Located u n d e r C is a code
telling why N L 2 S O L stopped: X means X-convergence, R means relative func-
tion-convergence, B means b o t h X and R, A m e a n s absolute function-conver-
ACM T r a n s a c t m n s on M a t h e m a t m a l Software, Vol 7, No 3, September 1981
An Adapt,ve Nonhnear Least-Squares Algorithm 359
ROSNBROK [40]
HELIX 1 [22]
SINGULAR [35]
WOODS [11]
ZANGWILL 2 [44]
ENGVALL [19]
BRANIN [ 9]
BEALE [ 5]
CRAGG 3 [26]
BOX [ 8]
DAVIDON1 4 [13]
FRDSTEIN 5 [23]
WATSON6,9,12,20 6 [29]
CHEBQD8 [20]
BROWN 7 [10]
BARD [ 2]
JENNRICH [28]
KOWALIK [29]
OSBORNE1,2 [34]
MEYER [30]
Notes
1. T h e residual vector R(x) for this problem is a discontinuous
function of x On those r u n s where NL2SOL halts with false con-
vergence, the iterates have converged to a point of discontinuity
2. This is a linear least-squares problem t h a t NL2SOL solves in
one step when the initial step bound, that is, V(LMAX0), is inn
creased s o m e w h a t from its default value of 100 (to at least 174).
3. T h e olagmal Mlele problem described in [12], which Gill and
M u r r a y [26] cite as the source for this problem, does not have the
residual component r~(x) = x4 - 1 This new component forces x4 to
move more rapidly toward 1, but otherwise causes no noteworthy
change in the performance of NL2SOL.
4 This is a very ill-conditioned linear least-squares problem If
V{LMAX0) is set large (to at least 1 9 × 107), then NL2SOL halts
with X-convergence after two steps when using double-precision
arithmetic on an I B M 370 computer With a double precision of a
few bits more accuracy, such as that of the Honeywell 6180 or the
Univac 1110, NL2SOL attains absolute function convergence after
a single step
5 In all our test runs, NL2SOL found a local solution to this
problem. T h e residual vector vanishes at the global solution
6. WATSON20 lies near the boundary between zero-residual and
nonzero-residual problems. After the first dozen or so iterations,
NL2SOL can neither make further substantial reductions in the
sum of squares nor satisfy any of its default convergence criteria
To reduce the computer time spent on this problem, we used a
function evaluation limit of 20 and an iteration limit of 15 on all
runs of WATSON20 reported here.
7 Gill and M u r r a y [26] call this problem "Davldon 2"
RCRNnROF O 22 18 A 26 19 A 18 15 A 23 21 A 31 22 h
ROSNmROK ~ 28 24 A 57 39 A 38 2q A 155 6q X 30 25 A
ROSNBROF 2 77 54 A 141 121 A 115 1 01 A 400 146 E 89 82 A
~ELT~ O q q X I~ 11 A 17 14 A 15 14 X 14 12 X
NELTX I 11 9 A 10 16 A 15 13 A 23 18 X 18 14 A
HELI~ 2 16 14 7 103 &5 F 28 23 X 25 Iq X 80 37 P
SINGULAR 0 20 20 A 20 2@ A 20 20 A 32 32 A 20 20 A
SINGULAR I 23 2-4 A 26 25 A 25 24 A aO 3q A 26 25 A
SI~TGULAR 2 28 27 A ~4 27 A .~4 27 A 49 44- A 34 27 A
WOODR 0 61 45 x 70 47 A RO 64 A 45 3Q A 70 48 A
WOODS I 6~ ~& A 50 ~6 A ,q,7 70 A 47 3~ A 117 70 A
W0~DR 2 72 q2 X 77 5~ X 8q 65 A 63 45 X 93 65 A
ZANQWILL 0 ~ Z 3 5 3 A 3 3 A 3 3 A
EITGVALL 0 17 15 A 17 13 X 14 12 ",,[ lq 17 X 18 13 X
EVGVALL 1 20 18 A 21 19 x 20 Im X 27 21 R 20 18 A I
ENGVALL 2 27 25 R Xl 26 A 1 O0 72 A 44 37 X 36 30 A 2
BRANIN 0 2 2 A 2 2 A 2 2 A 2 2 A 2 2 A
PRAI~I~ I 17 ~5 ~ ~,~ ~5 A 14 12 A 28 25 h 17 15 A
BRANIN 2 16 14 A 20 10 A 21 12 A 49 38 A 20 10 A
BEALE 0 10 A A 10 9 A 10 9 X IQ 15 A 10 9X
mEALF I q £ A £ 6 A 6 6 A 13 12 A 6 6 h
CRAqG 0 2~ 22 A 24 23 23 22 A 34 32 X 24 23 A
CRAGG 1 ~4 4"7 A 80 47 R 1 50 91 R 75 48 R 120 78 R 3
~OX 0 7 7 Y 7 7 X 7 v X 8 8 X 7 7 X
BOX 1 27 1Q R 16 10 8 12 11 R 45 20 S 16 11 F
D~VIDONI O z X 20 15 X 1 c~ 14 X 20 16 I 20 15 X
PRDSTEIN 0 q 9 R q R Z7 16 F a 8 R 9 7 R 4
FRDSTEIN I lq 15 R 18 13 R 44 22 ~ 22 19 R 18 13 R 4
FRDSTE[N 2 2R 2~ P 28 1q R 53 26 P '58 30 B 30 20 R 4
WAT?ON6 0 A R R 12 10 9 12 11 R 16 12 B 12 10 B
WATSOHn 0 10 9 R 10 O R 11 9 B 21 14 B 10 9R
WATSON12 0 14 11 R 14 12 R 16 14 R 23 17 B 15 12 B
WATSON20 0 20 14 E 18 16 I 18 16 1 18 16 I 18 16 I 5
CHE~QD~ O 22 16 R 2z 1R B 58 36 F 20 16 R 80 35 F 4, 6
CHEBODR I 78 6~ R 77 57 R 400 109 E 143 105 R 1 02 78 S
RRnWN 0 14 13 R 18 17 R 305 301 I 19 17 R 32 30 R
BROWN I 15 15 R 22 16 R 400 281 E 27 21 R 107 64 R
BROWN 2 24 23 R ~I 21 R 400 296 E 35 26 R 40 27 R
BARD O 7 7 R v 7 R '7 7 R 11 10 B 7 7R
BARD I 36 22 R 32 23 S 32 23 S 81 42 S 32 23 S 7
BARD 2 Z7 23 B 70 28 R 63 27 R I 29 58 R 75 32 S 8
,JENNRICH 0 16 12 B 15 15 R 35 19 F 11 11 R 16 14 R 4
VOWALIK 0 14 12 R 11 10 R t~ 17 R 15 12 R 1I 1OR
KOWALIF I 18q 88 s 130 75 S 127 77 S 1 27 73 R 93 65 S 9
FOWALIK 2 112 6q R 75 58 R q6 81 R 400 200 E 138 124 R
OSBORNEI 0 ~4 26 R 27 22 R 18 16 R '54 31 R 18 16 R
0SBORNE2 O 15 I~ R 17 16 B 15 14 R 16 15 R 16 15 R
OSBORNE2 I 16 12 S 26 12 S 16 11 S 28 16 S 27 13 S
MADSE~ 0 12 12 R 12 12 R ~3 33 R 12 12 R 13 13 R
MADSEN I 14 14 B 16 15 R 39 36 R 19 18 R 21 19 R
MADSEN 2 21 20 B 28 20 R 47 40 R 28 23 R 37 29 R
MEYER 0 380 229 B 335 206 X 346 213 B 156 129 B 322 I 99 B
Notes
1. T h e P U R E S run found a local m l m m m e r x* having f(x*) = 56.1.
2 T h e D = I run also found f(x*) -- 56.1
3. All runs found different local minimizers" for D = I, f(x*) = 1.68 x 10-21; for D E F A U L T , f(x*)
= 6 17 × 104; for P U R E GN, f(x*) = 1 50 × 10 ~, for P U R E S, f(x*) = 2.30 x 10T; and for N O SIZING,
f(x*) = t 35 × 105
4 In the P U R E GN runs of these problems, NL2SOL reports false convergence because the
Jacobian m (nearly) singular at the solutions found and the G a u s s - N e w t o n H e s s i a n differs sufficmntly
from the true one that the singular convergence test is not satmfied w i t h the convergence tolerances
at their default values If the b0 m (6 6) were changed from 100 to 1, t h e n N L 2 S O L w o u l d report
singular convergence on J E N N R I C H , and if the ~R in (6.6) were also increased slightly from 10 -~°, say
to 2 3 × 10 -"), t h e n NL2SOL w o u l d also report singular convergence on F R D S T E I N . N o t e t h a t the
true Hessian is q m t e p o s g w e definite at the solutions found
5 T h e final funcUon values were as follows for D = I, 1 36 x 10-16; for D E F A U L T , 6.51 × 10 -18,
for P U R E GN, 6 50 × 1O ,8, for P U R E S, 3.49 × 10 -'6, and for N O SIZING, 4 98 × 10 -~8
6 In the NO SIZING run, N L 2 S O L often tried the a u g m e n t e d model, but always switched back
to the G a u s s - N e w t o n m o d e l (Thin run c o m p u t e d slightly ddferent iterates t h a n t h e correspondmg
P U R E GN run because the latter used S , ~ 0 m (7.1).)
7 T h e P U R E S run found f(x*) ~ 8.57.
8 T h e NO SIZING run found f(x*) = 5 74 x 10 -2
9. T h e P U R E S run found f(x*) = 1.54 × 10 -s
362 J . E . Dennm, Jr., D. M. Gay, and R E. Welsch
ROSNBROM 0 26 Ig ~ 27 Iq A 26 21 X 21 18
ROSNBROK I 57 3q A z6 2q A 74 57 A 45 ~5 A
ROSNBROK 2 141 121 A 135 115 A 164 I~5 A 210 155 A
HELIX O 13 11 A 13 11 A 13 11 A 17 14 Z
HELIZ I lq 16 A 17 14 A lq 16 A lq 15 )
HELIX 2 103 45 m 110 53 F 20 16 X qq 43 P
SINGULAR 0 20 20 A 20 20 A 20 20 A 20 20 A
SINGULAR I 26 25 A 26 25 A 25 25 A 2R 25 A
SINGULAR 2 34 27 A 34 27 A 31 31 A 54 27 A
WOODS O 70 47 A 65 47 X 59 51 A 75 4q X
WOODS I 5q a6 A 71 48 X 41 37 A 54 42 A
WOOD~ 2 77 53 X 77 53 X 5Q 55 X 87 54 A
ZANGWILL 0 3 ~ A z z A 3 3 A 3 3 A
ENGVALL 0 17 13 X 16 Iz X 17 14 X 17 13 X
ENGVALL I 21 1Q X 2z 19 Y 16 16 X 22 19 A
ENGV&LL 2 Zl 26 ~ Zl 26 A 56 34 X z6 28 A
BRANIN 0 2 2 A 2 2 A 2 2 A 2 2 A
BRANIN I 19 15 A 18 I~ A 25 25 A 16 13 A
BRANIN 2 20 10 A 20 10 A %q 3~ A 23 12 A
BEALE 0 10 Q A 11 Q A 10 Q ~ 10 9 A
BEALE I 6 6 A 6 6 A 6 6 A 6 6 A
CRAGG 0 24 2~ A 23 21 A 24 2z A 25 24 A
CRAGG I qO 47 R ~0 47 R IOq QZ ~ 80 49
BOX 0 7 7 x 7 7 x 7 7 x 7 7 x
BOX I 16 10 S 17 10 S 16 10 S 15 11 R
DAVIDONI O 20 15 X 20 15 X 16 16 I 20 13 E
FRDSTEIN 0 q 8 R R 7 R q R R Q a R
FRDSTEIN I 18 13 R 18 13 R 19 17 R IR 14 B
FRDSTEIN 2 28 lq R Z5 22 R 2q 27 R 2q 20 B
WATSON6 O 12 10 B 12 10 B 11 10 B 12 10 B
WATSONq O 10 g R 10 g R 10 q B 11 9 B
W&TSON12 O 14 12 R 14 12 R 14 13 R 18 13 P
W&TSON20 O 18 16 I 18 16 I 16 16 I 20 15 E
CHEBOD8 0 23 18 B 24 19 B 19 14 R 23 18 B
CHEBQD8 1 77 5'7 R 118 76 R 112 98 R 104 82 R
BROWN O 18 17 R 18 17 R 20 19 R 20 18 R
BROWN I 22 16 R 25 lq R 24 25 R 26 20 B
BROWN 2 31 21 R z2 22 B 30 29 B 32 23 R
BARD 0 7 7 R 7 7 R 7 7 R 7 7 R
]%ARD I z2 23 S 32 23 S 2q 29 S 32 23 S
BARD 2 70 28 R 77 28 R 66 43 R 66 30 R
JENNRICY O 15 13 R 15 13 R 15 13 R 15 13 R
KOWALIK 0 11 10 R IZ 10 B 11 10 R 13 10 R
KOWALIF I 130 75 S 244 IOO F IOg 84 S 124 76 S
KOWALIK 2 75 58 R 74 58 R 78 62 R 117 }%1 R
0SBORNEI O 27 22 R 31 22 R 28 23 R 34 23 R
0SBORNE2 O 17 16 B 17 16 ]% 17 16 B 18 16 B
OSBORNE2 I 26 12 S 27 12 S 15 12 P 26 12 S
MADSEN O 12 12 R 12 12 R 12 12 R 12 12 R
MADSEN 1 16 15 R 16 15 R 18 18 B 19 17 R
MADSEN 2 28 20 R 29 21 R 25 25 R 27 22 R
MEYER O 335 206 X 343 214 B 2 0 9 181 B 351 213 X
Notes.
1. T h e N O I N T D B L run found f ( x * ) = 9.30 × 104, and the N O G R D T S T run found
f ( x * ) = 8.65 × 104
2 T h e final function values were as follows for D E F A U L T and N O I M O D S W , 6 51 ×
10 -~s, for N O I N T D B L , 3 48 x 10-17; for N O G R D T S T , 9.71 x 10 -Is
3 If the defaults for bo or ~R In (6 6) were shghtly relaxed (e.g., if bo were reduced from
100 to 50, or if ~R were increased from 10 -1° to 1.5 × 10-1°), t h e n the N O I M O D S W run
w o u l d also report singular convergence.
PROBLEM LS NF NG C NF NG C NF NG C NF NG C NOTE
ROSNBROK O 22 18 A 50 37 X 39 32 X 40 36 X
ROSNBROK I 28 24 A 70 56 A 104 74 X 96 70 X
ROSNBROK 2 77 54 A 340 251 I 229 174 X 201 146 X I
HELIX 0 9 9 X 39 30 X 41 33 X 38 28 X
HELIX I 11 9 A 47 35 X 57 40 X 48 34 X
HELIX 2 16 14 X 57 40 X 57 37 X 35 25 X
SINGULAR 0 20 20 A 45 45 A 77 75 A 80 75 A
SINGULAR I 23 23 A 53 53 A 88 86 A 91 82 A
SINGULAR 2 28 27 A 91 89 A 108 99 A 99 90 A
WOODS O 61 45 X 102 75 X 128 89 X 103 79 X
WOODS I 63 46 A 130 9q X 92 72 X 80 61 X
WOODS 2 72 52 X 96 83 X 79 70 X 72 54 X
ZANGWILL O 3 3 A 3 3 A 6 3 A 10 7 A
ENGVALL 0 17 15 A 35 30 X 36 32 X 33 30 X
ENGVALL I 20 18 A 53 42 X 56 45 X 43 39 X
ENGVALL 2 27 25 R 83 75 X 79 71 X 66 55 X 2
BRANIN 0 2 2 A 2 2 A 19 16 A 18 15 A
BRANIN I 17 15 A 28 28 A 38 34 A 38 33 A
BRANIN 2 16 14 A 51 49 A 64 56 A 48 35 A
BEALE 0 10 8 A 21 17 X 18 13 X 17 14 A
BEALE I 8 8 A 19 17 X 16 15 X 16 15 X
CRAGG 0 23 22 A 118 108 A 119 112 A 115 102 A
CRAGG I 54 47 A 88 76 R 128 90 R 185 116 R 3
BOX 0 7 7 X 16 15 X 29 22 X 48 35 A
BOX I 27 19 R 39 20 X 52 41 B 37 27 B
DAVIDONI 0 3 3 X 4 4 X 6 5 X 20 2 F
FRDSTEIN O 9 9 R 9 9 R 9 8 R 13 11 R
FRDSTEIN I 18 15 R 29 24 R 30 25 R 30 24 R
FRDSTEIN 2 28 23 B 44 38 R 51 39 R 55 39 R
WATSON6 O 8 8 R 25 21 R 25 21 R 41 34 R
WATSON9 0 10 9 R 22 22 B 22 22 B 81 72 R
WATSON12 0 14 11 R 32 27 R 33 28 B 125 110 R 4
WATSON20 0 20 14 E 16 16 1 16 16 1 18 16 I 5
CHEBQD8 0 22 16 R 40 32 R 38 28 R 3q 27 R
CHEBQD8 I 78 63 R 234 208 B 232 208 B 227 186 R
BROWN O 14 13 R 25 21 R 24 20 R 46 35 R
BROWN I 15 15 R 45 43 R 52 46 R 41 30 R
BROWN 2 24 23 R 78 73 R 80 71 R 47 38 R
BARD O 7 7 R 20 16 R 17 16 R 22 18 R
BARD I 36 22 S 79 59 S 66 46 R 34 23 R 6
BARD 2 37 23 B 80 55 S 89 49 R 73 43 R 7
JENNRICH 0 16 12 B 16 14 R 16 14 R 34 22 R
KOWALIK 0 14 12 R 27 19 R 27 19 R 42 33 R
KOWALIK I 189 88 S 220 159 S 55 48 S 91 73 R 8
KOWALIK 2 112 69 R 78 56 S 112 72 R 221 124 R 9
0SBORNEI 0 34 26 R 56 42 R 56 42 R 83 59 R
0SBORNE2 0 15 13 R 37 32 R 43 34 R 75 59 R
OSBORNE2 I 16 12 S 28 20 R 52 31 B 53 31 B
MADSEN O 12 12 R 15 15 R 13 13 R 16 16 R
MADSEN I 14 14 B 30 28 R 30 28 B 31 28 R
MADSEN 2 21 20 B 36 35 R 39 32 R 41 36 R
MEYER O 380 229 B 400 268 E 400 277 E 400 259 B 10
Notes
1 T h e (J**T)*J run s t o p p e d w i t h f(x) = 1.32.
2. N L 2 S O L found a local soluUon w i t h f(x* ) = 56 1, t h e S U M S O L runs all found t he
global solution.
3. N L 2 S O L found t h e global solution, a n d e a c h S U M S O L r u n found a different local
so lution for (J**T)*J, f(x*) = 232, for L M A X 0 = I , f(x*) = 33 0, a nd for Ho = I, f(x*)
= 1 . 2 7 x 106.
4 T h e Ho - I r u n of S U M S O L found f(x*) = 1 33 x 10 -7.
5 T h e final function v a l u e s were as follows: for NL2SOL, 1.36 × 10 -~4, for (J**T)*J,
0 290; for L M A X 0 = I , 0 293, a n d for H0 = I, 4.13 × 10 -3
6 T h e (J**T)*J r u n f o u n d / ( x * ) = 8.51 and t h e L M A X 0 = I r u n f o u n d / ( x * ) = 1.18.
7 T h e (J**T)*J r u n found f(x*) = 5.74 × 10 -2 a n d t h e L M A X 0 = I r u n found f(x*)
= 0 943.
8 T h e L M A X 0 = I r u n found f ( x * ) = 2.90 x 10 -3 a nd t h e Ho = I run found f(x*) =
1.54 × 10 -4 (as did all r u n s for LS = 0).
9. T h e (J**T)*J r u n found f(x * ) = 3.40 × 10 -3 a n d t h e H0 = I run found f(x* ) = 4 71
X 10 -4.
10 T h e final funetlon v a l u e s for t h e S U M S O L r u n s were as follows; for (J**T)*J,
359.; for L M A X 0 = I , 189, for Ho = I, 237.
An Adaptwe Nonlinear Least-Squares Algorithm * 365
to select the steps it tries. It uses the same convergence tests as NL2SOL
(performed, in fact, by the same ASSESS module), so the return codes in the
columns labeled C in Table V have the same meaning as for the earlier tables.
Like NL2SOL, SUMSOL employs a scale matrix D, which can be updated from
the diagonal elements of the Hessian approximation, but to eliminate the effects
of different updates to D, we report only results for D -- I here. The columns
labeled NL2SOL, D = I repeat the D = I columns of Table III. Those labeled
(J* *T)*J show what happens when the initial Hessian approximation supplied to
SUMSOL is H0 = JoTJo, where Jo = J(xo) is the initial Jacobian matrix. (SUMSOL
actually works only with the Cholesky factor L of its Hessian approximation H
= LL T, and the initial L supplied in the (J**T)*J run was obtained from a QR
factorization of Jo.) The columns labeled LMAX0=I show what happens when
the imtial step bound is decreased from the default value that NL2SOL uses, that
is, 100., to the default value for SUMSOL, that is, 1.0, and everything else is the
same as for the (J**T)*J run. The columns labeled H0 = I show what happens
when SUMSOL sets its initial Hessian approximation to the identity matrix with
everything else as for the LMAX0=I run. Except as listed in the notes in Table
V, all runs found the final function value reported in Table II. None of the
SUMSOL runs dominates or is dominated by any of the other SUMSOL runs.
On problems where both find the same locally optimal function value, NL2SOL
generally requires fewer--sometimes substantially fewer--function and gradient
evaluations than SUMSOL, so in cases where function evaluations are expensive,
Table V suggests that it is quite worthwhile to exploit the structure present in
the least-squares Hessian.
REFERENCES
1 ALLEN, D M Private commumcatlon, 1976
2 BARD, Y Comparison of gradient methods for the solution of nonlinear parameter estimation
problems S I A M J Numer A n a l 7 (1970), 157-186
3 BARD, Y Nonlinear Parameter E s h m a t m n Academic Press, New York, 1974
4 BATES, D M , AND WATTS, D G An orthogonahty convergence criterion for nonlinear least
squares Queen's Mathematmai Preprmt 1979-14, Queen's U m v , Kingston, Ont., Canada, 1979
5 BEALE. E M L On an lteratlve method for finding a local minimum of a function of more than
one vallable Tech Rep 25, Statistical Techniques Research Group, Princeton Univ., Princeton,
N J , 1958
6 BELSLEY, D A On the efficient computation of the nonhneal full-reformation maximum like-
lihood estimator Tech Rep 5, Center for Computational Research in Economics and Manage-
ment Science, Massachusetts Institute of Technology, Cambridge, Mass, 1980
7 BETTS, J T Solving the nonlinear least squazes problem Application of a general method J
Optzm Theory Appl 18 (1976), 469-484
8 Box, M J A comparison of several current optlmlzatmn methods and the use of transformations
in constrained problems Comput J 9 (1966), 67-77
9 BRAN1N,F H Widely convergent method for finding multiple solutions of simultaneous nonlin-
ear equatmns I B M J Res Develop 16 {1971), 504-522
10 BROWN, K M , AND DENNIS, J.E A new algorithm for nonlinear least-squares curve fitting In
Mathemattcal Software, J R Rice, Ed., Academic Press, New York, 1971, 391-396.
11 COLVILLE, A R A comparative study of nonlinear programming codes Tech Rep 320-2949,
IBM New York Scientific Center, 1968
12 CRAGG, E E , AND LEVY, A V Study on a supermemory gradient method for the minimization
of functions J Opttm Theory Appl 4 (1969), 191-205
13 DAVIDON,W C New least-square algorithms J Opt~m Theory Appl 18 (1976), 187-197.
14 DENNIS, J E Some computational techniques for the nonlinear least squares problem In
Numertcal Solutmns of Systems of Nonhnear Equations, G D Byrne and C A Hall, Eds,
Academic Press, New York, 1973, pp 157-183
15 DENNIS, J E Nonlinear least squares and equations In The State of the A r t *n Numerical
Analys~s, D Jacobs, Ed, Academic Press, London, 1977, pp 269-312
16 DENNIS, J E, AND MEI, H.H -W Two new unconstrained optimization algorithms which use
function and gradient values J Opt,m Theory Appl 28 (1979), 453-482
17 DENNIS, J E , AND MORE, J J Quasi-Newton methods, motivation and theory S I A M Rev 19
(1977), 46-89
18. DENNIS, J . E , ANI) WELSCH, R E. Techmques for nonlinear least squares and robust regression.
Commun Statist B7(1978), 345-359
19 ENGVALL,d L Numerical algorithm for solving over-determined systems of nonlinear equations
NASA Document N70-35600, 1966
20 FLETCHER, R Function minimization without evaluating derlvatlves~A review Comput J 8
(1965), 33-41
21 FLETCHER,R A modified Marquardt subroutine for nonhnear least squares Rep R6799, AERE,
Harwell, England, 1971
22 FLETCHER, R., AND POWELL, M.J D A rapidly convergent descent method for mlmmizatlon.
Comput J 6 (1963), 163-168
23 FREUDENSTEIN, F , AND ROTH, B Numerical solution of systems of nonlinear equations J,
A C M 10, 4 (Oct 1963), 550-556
24 GAY, D M Computing optimal locally constrained steps S I A M J Sct StatLst. Comput 2, 2
(June 1981), 186-197
25 GaY, D M Subroutines for general unconstrained minimization using the model/trust-region
approach Tech Rep 18, Center for Computatmnal Research m Economics and Management
Scmnce, Massachusetts Institute of Technology. 1980
ACM Transactions on Mathematical Software, Vol 7, No 3, September 1981
368 J E. Dennis, Jr., D M. Gay, and R E. Welsch
26 GILL, P E , AND MURRAY, W Algorithm for the solution of the nonlinear least-squares problem
S I A M J Numer. A n a l 15 (1978), 977-992
27. GOLUB, G H Matrix decompositions and statlstwal calculations In Statistical Computatmn,
R.C Milton and J.A. Nelder, Eds, Academic Press, New York, 1969, pp 365-397
28 JENNRICH,R I , AND SAMPSON,P F Apphcation of step-wise regression to nonlinear estimation.
Technometrtcs 10 (1968), 63-72
29 KOWALIK, J S , AND OSBORNE, M R Methods for Unconstrazned Opttmtzatmn Problems,
American Elsevier, New York, 1968
30 MEYER, a R Theoretical and computational aspects of nonlinear regression In N o n h n e a r
Programming, J B Rosen, O L Mangasarlan, and K RItter, Eds, Academm Press, New York,
1970
31 MORE, J J The Levenberg-Marquardt algorithm Implementation and theory In Lecture Notes
zn Mathemattcs No 630 Numerical Analysts, G Watson, E d , Sprmger-Verlag, New York, 1978,
pp 105-116
32 MORI~, J J. Implementation and testing of optimization software D A M T P Rep 79/NA4, Cam-
bridge Univ, Cambridge, England, 1979
33 OREN, S S Self-scaling variable metric algorithms without line search for unconstrained min-
imization M a t h Comput 27 {1973), 873-885.
34 OSBORNE, M R Some aspects of nonhnear least squares calculations In Numerical Methods
for N o n h n e a r Opttmtzatzon, F A Lootsma, Ed., Academic Press, New York, 1972
35 POWELL, M J D An iteratlve method for finding stationary values of a function of several
variables Comput J 5 {1962), 147-151
36 POWELL, M J.D A FORTRAN subroutine for unconstrained mlmmizatlon, requiring first deriv-
atives of the objective function. Rep AERE-R.6469, AERE Harwell, England, 1970.
37 PRATT, J W When to stop a quasi-Newton search for a maximum bkehhood est,mate Working
Paper 77-16, Harvard School of Business, Cambridge, Mass., 1977
38 RAO, C R. Ltnear Statistical Inference and Its Applwatmns, 2nd ed, Wiley, New York, 1973.
39 REINSCH, C H. Smoothing by sphne functions. II Numer M a t h 16 (1971), 451-454
40 ROSENBROCK, H H An automatic method for finding the greatest or least value of a function
Comput J 3 (1960), 175-184
41 WEDIN, P -A The non-linear least squares problem from a numerical point of view, I and II
Comput Sci Tech Reps, Lund U m v , Lund, Sweden, 1972 and 1974.
42 WE')IN, P -A On surface dependent properties of methods for separable non-hnear least squares
problems ITM Arbetsrapport nr 23, Inst for Tellampad Matematik, Stockholm, Sweden, ] 974.
43 WEDIN, P-A. On the Gauss-Newton method for the non-hnear least squares problem ITM
Arbetsrapport nr 24, Inst for Tellampad Matematik, Stockholm, Sweden, 1974.
44 ZANGWlLL,W J Nonhnear programming via penalty functmns Manage Sct 13 {1967), 344-
358
Received September t977, revised August 1979 and September 1980, accepted April 1981