Uploaded by Oleg Shirokobrod

Corradi.Stefanini.Computational experience with algorithms for separ

advertisement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/225394406
Computational experience with algorithms for separable nonlinear least
squares problems
Article in Calcolo · September 1978
DOI: 10.1007/BF02575921
CITATIONS
READS
7
56
2 authors:
Corrado Corradi
Luciano Stefanini
University of Bologna
Università degli Studi di Urbino "Carlo Bo"
36 PUBLICATIONS 95 CITATIONS
93 PUBLICATIONS 1,691 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Interval, fuzzy and set valued analysis with applications View project
All content following this page was uploaded by Luciano Stefanini on 22 May 2014.
The user has requested enhancement of the downloaded file.
SEE PROFILE
COMPUTATIONAL EXPERIENOE WITH ALGORITHMS
FOR S E P A R A B L E N O N L I N E A R L ~ A S T
S Q U A R E S P R O B L E M S (~)
C. CORRAD1 (2) . L. STEFANINI (3)
ABSTRACT Nonlinear least squares problems frequently arise in which the fitting function
can be written as a linear combination of functions involving further parameters in
a nonlinear manner. This paper outlines an efficient implementation of an iterative
procedure originally developed by Golub and Pereyra and successively modified by
various authors, which takes advantage of the linear-nonlinear structure, and
investigates its performances on various test problems as compared with the standard
Gauss-Newton and Gauss-Newton-Marquardt schemes.
-
1. Introduction.
The last several years have witnessed an increasing interest in separable
nonlinear least squares problems, that is, non linear least squares problems in
which the parameters to be solved for can be separated into a linear part and
a nonlinear part. Typical applications are least squares approximations by exponential or rational functions.
Various techniques have been proposed by several authors which take
advantage of the special structure of this kind of models: they generally consist
of two stage iterative processes, each stage dealing with one set of parameters.
For a comprehensive bibliography the reader should refer to [5]; see also [7],
[ 1 2 ] , [11], [1] and references cited there.
Received May 25, 1977.
(1) A preliminary version of this note has been presented at the CNR-GNIM
meeting held in Florence, september 1976.
(2) CNEN, Centro di Calcolo, and Istituto di Scienze Economiche, Universith di
Bologna.
(3) SOGESTA s.p.a., Urbino.
318
C. CoRaaot - L. STZFANINt:
Computational experience
The purpose of this note is to compare the performances of the generalized
Gauss-Newton (GN) and Gauss-Newton-Marquardt (GNM) algorithms modified
for separable problems along the lines suggested by Golub and Pereyra [5],
Kaufman [7], and the corresponding standard GN and GNM methods on a
set of test problems commonly used for numerical experiments. In Section 2 we
outline an efficient implementation of the mentioned algorithms, while in
Section 3 we discuss some experimental results.
2. Gauss-Newton and Gauss-Newton-Marquardt algorithms for separable nonlinear least squares.
The problem of interest can be stated briefly as follows: determine the
parameter vectors a e R " , beR" to minimize ~he functional
(1)
F (b, a ) = [IX (a) b + / ( a ) - y l l z,
where y e R r is a given vector of observations on the dependent variable, X (a)
is a given T •
matrix of observations on the independent variables whose
elements are functions of the parameter vector a only, and f is a given vector
function of a, (possibly) involving some of the independent variables. All the
functions are assumed to be continuously differentiable with respect to a. We
shall refer to b and a as the linear and nonlinear parameters respectively.
It is definitely clear that the standard approach is to minimize (1) with
respect to the vector [b', a']'6R '~+n as a whole, that is all parameters are
considered nonlinear.
The technique proposed by Golub and Pereyra is based on the observation
that for any fixed a, the optimal choice for b is
A
(2)
b = X + (a) v (a),
where v ( a ) = y - / ( a ) , and X + denotes the Moore-Penrose inverse o f X. Subsituting in (1) yields a problem in the nonlinear parameters only:
(3)
1[[X (a) X + (a)--I] v (a)[I 2 -- Ilr (a)[[ 2.
The two approaches (1), (3) are shown to be equivalent under the assumption
that X (a) has constant rank for a belonging to some open set S2 containing the
desired solution. In what follows we shall assume that X (a) has full rank
for aeS2.
with algorithms lor separable nonlinear least squares problems
319
Problem (3) can be solved by some nonlinear least squares method, e. g. GN:
(4)
ai+l=ai-- yi [D (r (ai))] + r ( a i ) = a i - Yi hi,
or GNM algorithm:
[D (r (ai))]+
(5)
where [5]
D (r (a))=(X (a) X + ( a ) - I ) [D (X (a)) X + (a) v (a)+D (t (a))] +
+ [(X (a) X + (a)--l) D (X (a)) X + (a)]' ~ (a).
The procedure can be improved in the following way.
Let us first construct an orthogonal factorization of X, e. g. using The
Householder transformations,
,~
s,,
for any fixed a, where Q is T • T orthogonal, R1 is n • n upper triangular, S is
a n • n permutation matrix. Then we have
XX+-I=--Q'[
0
~r-n] O
and, by the well known isometric property of orthogonal matrices,
II [ X (a) X + ( a ) - - I ] '~ (a)ll
where Q =
=
llo (a) (a)ll - lir (a)lP,
[o,1
Q~lr-,,
The corresponding GN or GNM iterations, which have been proposed by
Krogh [8], are obtained from (4), ( 5 ) b y using rz (a), O2D (r(a)) in place
of r (a), D (r (a)). This modification yields a slight reduction of the amount of
computation required per iteration.
A further improvement is described by Kaufman [7], who suggests to
ignore the second term in the expression of O2D (r (a)), so that the iterative
schemes reduce to
(4')
aj+~=aj-~,j [Wz (ai)] § rz (ai),
32O
C. CORP.ADI - L. STEFANINI: Computational experience
and
(5')
aj+l -" ai --
v/I
j
respectively; here we have denoted
W2 (a)=O2 (a) [D (X (a)) X + (a) v (a)+D (/(a))].
The resulting algorithms, which obviously require less computational effort,
are shown [12] to have roughly the same asymptotic convergence rate as those
proposed by Golub and Pereyra. Following [12], we shall refer to (4') and (5')
as the restricted GN and GNM algorithm respectively.
Let us now describe a recommended computational scheme for the restricted
and unrestricted (or full) GN and GNM method.
i)
Restricted GN and GNM.
STEP 1. Choose an initial guess for a.
A
STEP 2. Compute the factorization (6). Then b ( a ) = X + (a) v (a) is computed
by solving the upper triangular system R~ b"=dl, where
Ov=[dl] n
d2 T--n ~
and setting ~ - - S b .
REMARK. This step corresponds to the least squares solution of the linear
problem
IIx (a)
b-v
(a)l['
for fixed a.
STEP 5.
Compute
W=O [DX b+D]]=
W2 t-'n"
REMARK. The explicit computation of W~ is not needed unless one wants
to evaluate the estimated variance-covariance matrix of the parameters (for
details see [5]).
with algorithms [or separable nonlinear least squares problems
321
STEP 4. Perform an iteration according to (4') or (5'). Using the Householder
transformations again, obtain
~
R2 l"~, '
W2=Q" f 0
where (~ is (T--n)X(T--n) orrhogonal, R2 is m •
m • m permutation matrix.
upper triangular, S is a
Then solve the upper triangular system R2 h ' = - - ~ , where
d~ ]/'--n--m
and put h--S h. This is the correction term for GN.
For the GNM iteration it is necessary to complete the decomposition of
the augmented jacobian
(7)
[ W2
]
according to the scheme
(8)
w:=_'73 o
vI
0
1o 1~,-~,'
T--n
where Q is (T--n+m) • (T--n+m) orthogonal.
Then solve the upper triangular system R2 h------d~, and set h=ffh..
REMARK 1. The two stage orthogonal decomposition (8) - originally described
by Jennings and Osborne [6] - requires less computational effort than the
standard one stage factorization. Moreover, it permits the calculation of the
Moore-Penrose inverse of the matrix (7) for different values of v without
complete refactorization of the matrix itself. Note that pivoting is not needed for
322
C. CORRAm- L. STEFANINI: Computational experience
the second stage of (8), since as a result of the first stage the euclidean norms
of the columns of the augmented jacobian turn out to be in descending order.
REMARK 2. For the actual implementation of the generalized GN and of
GNM aIgorithm we need to compute a) the steplength ), and b) the damping
factor v respectively. Convenient schemes are the following.
a) Set ~,=1 (full step GN). Then if I[r2(a-)'h)[[ 2>
_ ][r2(a)[[2 put
), <---DECR.y and repeat the test; otherwise go to the next step.
b) Put v = t/~N, where N denotes the maximum norm of r2 (a) corresponding to the current value of a, i. e. N = [[r2 (a)lt., and
: 10
if
c--" I 1
if
.01
if
10<N
1 < N < 10
N__<1
(see [2]). Then if IIr2 ( a - h ) / / 2 >- Ilr2 (a)l/2 put v <--EXP.v, perform the second
stage of (8) to get a new value of h, and repeat the test; otherwise go to the
next step.
Suggested values for DECR and EXP are 0.5 and 1.5 respectively [10].
STEP 5. Update the value of a according to (4') or (5') and if the convergence criterion is not satisfied go to Step 2.
ii) Full GN and GNM.
STEP 1. Choose initial guesses for a and for b.
STEP 2. Compute
D [X (a) b + ! (a)] = [X (a)
D (X ( a ) ) b + D (f (a))] = l .
Factorize
,oIRl
0
S',
where O is T • T orthogonal, R is ( r e + n ) X (m-t-n) upper triangular, and S is
a (m + n) X (m + n) permutation matrix.
a (m + n) • (m + n) permutation matrix.
Solve the system R h'=cl and set h=S'h, where
Oy=[Cl]m'l-n
C2 ]T--m--n
with algorithms lor separable nonlinear least squares problems
323
Then update the current values of the parameters according to GN:
or GNM iteration, after completing the factorization of the augmented jacobian:
R
0
=(~'
ff~
+",
c2
0
C2
-
(Q is ( T + m + n ) •
_ C3
orthogonal), and solving the upper triangular
system R h ~ c ~ , whence h = S "h, and
STEP 3. Test convergence, and go to Step 2 if the convergence criterion is
not met.
Further details as well as FORTRAN listings are given in [5], [4].
It is customary to compare different iterative procedures in terms of the number
of <<multiplications ;> needed to perfom one iteration. Let the number of multiplications for the unrestricted resp. restricted GN (GNM) be N resp. Nr (N' resp.
Nr'); then a simple but tedious calculation shows that
N - - N r = n T + 3mn>O,
N'--N/=N--Nr+n
(7m+ 7 / 2 n + 25/2)>O,
i. e. the restricted algorithms allow a slight saving of computation for any
combination of the parameters T, m, n. It may be remarked that this feature has
not been pointed out in the existing literature.
To summarize, the restricted algorithms seem to offer a-priori two main
advantages over the unrestricted ones:
i) reduction in dimension of the parameter space - a potentially
valuable feature, especially when poor starting guesses are available; and
ii) slightly reduced computational cost per iteration.
A comparative investigation of the effectiveness of the above algorithms
in treating a variety of test problems is given in the next Section.
C. Com~AOt - L. STEFANINI; Computational experience
324
3. Experimental results.
Some comparisons of the effectiveness of the algorithms outlined in the
previous Sectio.n have been given in the literature [5], [7], [8]. However,
examination of the test results presented so far can only give a fragmentary
picture of the relative performances of the restricted and unrestricted algorithms
because each study used different test examples which were run on different
computers. Moreover, Kaufman's algorithm, which seems to be the most efficient,
has been tested by the Author on two examples only. We feel that a more
desirable state of affairs would be to conduct an evaltmtion using a uniform
set of standards and test problems. Our experiments consists of two test series,
We made one series of tests on six problems which have been used by
various authors of nonlinear least squares algorithms. Some of them have
assumed the role of , classics >~, being repetitively used to compare the performances of algorithms.
The list of the test problems is given below. In what follows the term
~ optimal values ~ refers to those estimates of the parameters that have given
the least known value of the sum of squares of the residuals.
EXAMPLE 1, [2]. Model:
b [exp (-- 1 0 x ) - exp ( - x ) ] + exp ( - a ~ x ) - exp (--az x);
here T--10, n = l , m = 2 .
Initial values:
t[r112=0.3944;
a)
a=(0,20)', b=1.745,
b)
a=(0,10)', b=1.660, ]]rl12=7.984E-2.
Optimal values:
al*--a2*, b * = 0 ;
a l * = l , az*---10, b * = l ;
a1*=10, a2*=l, b * = - - l ;
IIr*ll~=0.
EXAMPLE 2, [8]. Model:
bl {exp [q (xl--x2 a,) a2] - 1 }/x2+ b2 {exp [q (xl--x2 al) a3] - 1 }/x2+
+b~ (xl-x~ a,)/x2;
here T = 1 9 , n--3, m = 3 .
with algorithms/or separable nonlinear least squares problems
325
Initial values:
a)
a=(2.055, 0.4721, 1)',
11r]12=6.204 E - 3;
b) a=(2.055, 0.05,1)',
b--(4.418E--8, 3.523E--12, 3.265E--5)',
b=(8.894e-5, 4.866E--12, --1.612E--4)',
11r[12--o.77o8.
Optimal values:
a*=(1.673, 0.4272, o.917)',
1[r'112-4.475E--3.
b*=(2.009 E--11, 7.433 E--8, 3.156 E--5)',
EXAMPLE 3, [10]. Model:
b,+b2 exp (--al x)+b2 exp (-a2 x);
here T=33, n = 3 , m=2.
Initial values:
a=(0.01, 0.02)',
b=(0.3266, 1.521, -0.9729)',
[ir*li~=4.918E-3.
Optimal values:
a*=(0.01287, 0.02212)',
b*=(0.3754, 1.935, -1.464)',
11:112=5.465E - 5 .
EXAMPLE 4, [10]. Model:
bl exp ( - a l x) +b2 exp [ - a z (x--a3) z] -q-b3exp [--a4 (x--as) 2] +
+ b4 exp [ - a6 (x-- aT)2] ;
here T--65, n = 4 , m--7.
Initial values:
a = (0.6, 3, 2, 5, 4.5, 7, 5.5)',
b--(1.313, 0.3342, 0.7262, 0.8198)',
1]r112=1.289.
Optimal values:
a*=(0.7540, 0.9052, 2.398, 1.365, 4.568, 4.824, 5.675)',
b*--(1.309, 0.4315, 0.6336, 0.5993)',
EXAMPLE 5, [9]. Model:
b al xl/(1 +a~ x~+a2 xg;
here T = 5 , n = l , m--2.
t[r'112--0.04014.
326
C. CORRADI- L. STEFANINI: Computational experience
Initial values:
a = (10.39, 48.83)',
b = 0.5058,
11r112=o.01378.
Optimal values:
a*=(3.151, 15.16)',
b* =0.7800,
I1r*112=4.355 E - 5 ,
EXAMPLE 6, [9]. Model:
b~+b2 exp (a x);
here T = 1 0 , n = 2 , m--1.
Initial values:
a=0.5,
b=(17.35, 1.967E--9)',
11r112=1.978.
Optimal values:
a*=o.02,
b*=(15.50,
12)', Ilr*ll2=l.E-a2.
We point out that the initial values of the linear parameters - required by
the full G N and G N M - were chosen according to (2), i. e. given ao, we put
b0=X § (ao)v (ao). (The initial values of the nonline~,r parameters are taken
from the references.) Using this trick, which is recommended in [5] even for
standard nonlinear least squares procedures, the unrestricted G N and G N M
give significantly better results than those reported in the literature for the
same problems.
Table 1 presents the results of fitting the above models with the data
values reported in the mentioned references. Here we have denoted by NI, NF
the number of iterations and the number of function evaluations to convergence.
More precisely, N I is to be understood as the number of gradient calls, while
N F denotes the total number of times the objective function was calculated.
The termination criterion was set at four significant digits in the parameter
estimates; the iteration cutoff was set at 50 due to computer time limitations,
and was completely arbitrary. Also tabulated is the obtained sum of squared
residuals. All computational work was performed in single precision arithmetic
on a CDC 6600 computer. In order to eliminate the coding aspect, we have
used essentially the same routines for the unrestricted and the corresponding
restricted version of the algorithms.
The above results are typical of the relative performances of the algorithms.
In fact, in all problems that have been tested, the restricted meth,ods either
behaved at least as well as the corresponding unrestricted methods - even using
the mentioned <~optimal >>choice of the starting values for the linear parameters or succeeded where G N or G N M had failed to obtain the optimal values. A
with algorithms /or separable nonlinear least squares problems
327
closer inspection of Table 1 indicates that the restricted G N slightly outperformed
the restricted G N M , and both were quite superior to the unrestricted methods.
It is remarkable that our test with the restricted G N M for problem 2 showed
convergence in fewer iterations than reported in [8]. The same remark applies
to our results for problems 3 and 4, as compared to those given by Kaufman.
Finally, we point out that computational results with our version of standard
G N M showed faster convergence than reported in [10] for problems 4 and 5
even using the same starting guesses as provided by Osborne. For further
comparisons the reader can refer to the mentioned references.
In ~h.e second test series we tested the algorithms on approximation of
artificially generated data in the form [12]
y t = exp (--0.1 xt)-- exp (--2 xt)+e.~Tt,
t = l . . . . . 25,
where ~Tt are taken from a sequence of normal pseudo-random numbers with mean
0 and standard deviation 1. By varying the value of e we can simulate a range
of possible sets of data values, and this permits to test the robustness of the(
algorithms. The results of some representative runs are listed in Table 2.
All runs were started at (0.2, 1.8).
It will be seen that the methods behaved identical when e is reasonably
small, while the restricted G N algorithm yielded a significant improvement in
iterations for quite large perturbations.
In addition to the test examples described above, the restricted algorithms
have been used succesfully in the solution of several <<real world ~ problems
arising in various application fields such as econometrics (e. g. maximum likelihood
estimation of the linear expenditure system) and physical chemistry (e. g. vapourpressure models in the vapour-liquid equilibrium of pure substances) showing
the same qualitative behaviour as reported in this Section.
4. Conclusions.
Numerical comparisons are notoriously difficult to generalize, involving as
they do factors such as choice of test function, starting position, convergence
criterion, steplength or damping factor, as well as the computer software and
hardware characteristics.
However, from the work accomplished thus far the conclusion can be drawn
that the linear-nonlinear approach described in Section 2 yields a remarkable
improvement over the standard all-nonlinear methods. In fact the results we
have seen indicate that the restricted algorithms are less sensitive to the choice
of the starting guesses, require fewer iterations to converge, and are more
reliable than the traditional G N and G N M schemes even in perturbed cases.
328
C. CORaAOI - L. STEFANINI:
I
Computationalexperience
I
I
I
I
I
{}
{:B
I~
I~
lIB
I'~
I~
-~
~.
~-
,~
~.
~.
{~.
,~
~
~
~
~
BI
I
A
~
.2
~
~
~.
~
-2
~.
A
~.
I I I l l l l l
I
o
I
I
I
I
O
I
o
uB
-o
O
~
0
~
~
A -~. ~
~
~
O
Z
with algorithms ]or separable nonlinear least squares problems
~o
/
qD
I
I
I
!
"U,
I
o
I
~
I
~'
r..-
r...-
0
-,-I.
R
.<
ei
329
330
C. CORRADI - L. STEFANINI:
Computational experience
REFERENCES
[1] R. H. BARrtAM, W. DRANE, An algorithm ]or least squares estimation o/ nonlinear
parameters when some o/ the parameters are linear, Technometrics 14 (1972),
757-766.
[2] K. M. BBAWN, I. E. DENNIS, Derivative ]ree analogues of the Levenberg-Marquardt
and Gauss algorithms /or nonlinear least squares approximation, Numer. Math.
18 (1972), 289-297.
[3] C. Couaam, L. STEFAmm, Un atgoritmo per la soluzione di problemi di minimi quadrati
non lineari, C . N . E . N . tech. rept. RT/EDP (76)5, (1976).
1"4] L. GARDIm, L. STEFAmm, Procedure di minimi quadrati Iineari e non Iineari basate
su trasformazioni ortogonali, SOGESTA internal report A-253, 1976.
[5] G. H. Goeun, V. PERU--, The differentiation of pseudo-inverses and nonlinear least
squares pr blems whose variables separate, SIAM J. Number. Anal. I0 (1975),
413-432.
[6] L. S. JENmNCS, M. R. OSBORNE, Applications of orthogonal matrix trans]ormations to
the solution of systems oJ linear and nonlinear equations, Tech. Rept. 37,
Computer Centre, Australian National University, (1970).
[7] L. KAUFMAN, A variable projection method /or solving separable nonlinear least
squares problems, BIT 15 (1975), 49-57.
[8] F. T. KROGH, Efficient implementation o/ a variable projection algorithm /or nonlinear
least squares problems, Comm. ACM 17 (1974), 167-169.
[9] R. R. MEYER, P. M. ROTH, Modified damped least squares: an algorithm /or nonlinear
estimation, I. Inst. Math. Appl. 9 (1972), 218-233.
[10] M. R. OSBORNE, Some aspects o] nonlinear least squares calculations, in <<Numerical
methods for nonlinear optimization>~, edited by F. A. Lootsma (1972), Acad.
Press, London and New York, 171-189.
['11] M. R. OSBORNE, Some special nonlinear least squares problems, SIAM J. Numer.
Anal. 12 (1975), 571-591.
[12] A. RUNE, P. A. WEDIN, Algorithms /or separable nonlinear least squares problems,
Stanford Computer Science Tech. Rept. 434, (1974).
View publication stats
Download