See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/225394406 Computational experience with algorithms for separable nonlinear least squares problems Article in Calcolo · September 1978 DOI: 10.1007/BF02575921 CITATIONS READS 7 56 2 authors: Corrado Corradi Luciano Stefanini University of Bologna Università degli Studi di Urbino "Carlo Bo" 36 PUBLICATIONS 95 CITATIONS 93 PUBLICATIONS 1,691 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Interval, fuzzy and set valued analysis with applications View project All content following this page was uploaded by Luciano Stefanini on 22 May 2014. The user has requested enhancement of the downloaded file. SEE PROFILE COMPUTATIONAL EXPERIENOE WITH ALGORITHMS FOR S E P A R A B L E N O N L I N E A R L ~ A S T S Q U A R E S P R O B L E M S (~) C. CORRAD1 (2) . L. STEFANINI (3) ABSTRACT Nonlinear least squares problems frequently arise in which the fitting function can be written as a linear combination of functions involving further parameters in a nonlinear manner. This paper outlines an efficient implementation of an iterative procedure originally developed by Golub and Pereyra and successively modified by various authors, which takes advantage of the linear-nonlinear structure, and investigates its performances on various test problems as compared with the standard Gauss-Newton and Gauss-Newton-Marquardt schemes. - 1. Introduction. The last several years have witnessed an increasing interest in separable nonlinear least squares problems, that is, non linear least squares problems in which the parameters to be solved for can be separated into a linear part and a nonlinear part. Typical applications are least squares approximations by exponential or rational functions. Various techniques have been proposed by several authors which take advantage of the special structure of this kind of models: they generally consist of two stage iterative processes, each stage dealing with one set of parameters. For a comprehensive bibliography the reader should refer to [5]; see also [7], [ 1 2 ] , [11], [1] and references cited there. Received May 25, 1977. (1) A preliminary version of this note has been presented at the CNR-GNIM meeting held in Florence, september 1976. (2) CNEN, Centro di Calcolo, and Istituto di Scienze Economiche, Universith di Bologna. (3) SOGESTA s.p.a., Urbino. 318 C. CoRaaot - L. STZFANINt: Computational experience The purpose of this note is to compare the performances of the generalized Gauss-Newton (GN) and Gauss-Newton-Marquardt (GNM) algorithms modified for separable problems along the lines suggested by Golub and Pereyra [5], Kaufman [7], and the corresponding standard GN and GNM methods on a set of test problems commonly used for numerical experiments. In Section 2 we outline an efficient implementation of the mentioned algorithms, while in Section 3 we discuss some experimental results. 2. Gauss-Newton and Gauss-Newton-Marquardt algorithms for separable nonlinear least squares. The problem of interest can be stated briefly as follows: determine the parameter vectors a e R " , beR" to minimize ~he functional (1) F (b, a ) = [IX (a) b + / ( a ) - y l l z, where y e R r is a given vector of observations on the dependent variable, X (a) is a given T • matrix of observations on the independent variables whose elements are functions of the parameter vector a only, and f is a given vector function of a, (possibly) involving some of the independent variables. All the functions are assumed to be continuously differentiable with respect to a. We shall refer to b and a as the linear and nonlinear parameters respectively. It is definitely clear that the standard approach is to minimize (1) with respect to the vector [b', a']'6R '~+n as a whole, that is all parameters are considered nonlinear. The technique proposed by Golub and Pereyra is based on the observation that for any fixed a, the optimal choice for b is A (2) b = X + (a) v (a), where v ( a ) = y - / ( a ) , and X + denotes the Moore-Penrose inverse o f X. Subsituting in (1) yields a problem in the nonlinear parameters only: (3) 1[[X (a) X + (a)--I] v (a)[I 2 -- Ilr (a)[[ 2. The two approaches (1), (3) are shown to be equivalent under the assumption that X (a) has constant rank for a belonging to some open set S2 containing the desired solution. In what follows we shall assume that X (a) has full rank for aeS2. with algorithms lor separable nonlinear least squares problems 319 Problem (3) can be solved by some nonlinear least squares method, e. g. GN: (4) ai+l=ai-- yi [D (r (ai))] + r ( a i ) = a i - Yi hi, or GNM algorithm: [D (r (ai))]+ (5) where [5] D (r (a))=(X (a) X + ( a ) - I ) [D (X (a)) X + (a) v (a)+D (t (a))] + + [(X (a) X + (a)--l) D (X (a)) X + (a)]' ~ (a). The procedure can be improved in the following way. Let us first construct an orthogonal factorization of X, e. g. using The Householder transformations, ,~ s,, for any fixed a, where Q is T • T orthogonal, R1 is n • n upper triangular, S is a n • n permutation matrix. Then we have XX+-I=--Q'[ 0 ~r-n] O and, by the well known isometric property of orthogonal matrices, II [ X (a) X + ( a ) - - I ] '~ (a)ll where Q = = llo (a) (a)ll - lir (a)lP, [o,1 Q~lr-,, The corresponding GN or GNM iterations, which have been proposed by Krogh [8], are obtained from (4), ( 5 ) b y using rz (a), O2D (r(a)) in place of r (a), D (r (a)). This modification yields a slight reduction of the amount of computation required per iteration. A further improvement is described by Kaufman [7], who suggests to ignore the second term in the expression of O2D (r (a)), so that the iterative schemes reduce to (4') aj+~=aj-~,j [Wz (ai)] § rz (ai), 32O C. CORP.ADI - L. STEFANINI: Computational experience and (5') aj+l -" ai -- v/I j respectively; here we have denoted W2 (a)=O2 (a) [D (X (a)) X + (a) v (a)+D (/(a))]. The resulting algorithms, which obviously require less computational effort, are shown [12] to have roughly the same asymptotic convergence rate as those proposed by Golub and Pereyra. Following [12], we shall refer to (4') and (5') as the restricted GN and GNM algorithm respectively. Let us now describe a recommended computational scheme for the restricted and unrestricted (or full) GN and GNM method. i) Restricted GN and GNM. STEP 1. Choose an initial guess for a. A STEP 2. Compute the factorization (6). Then b ( a ) = X + (a) v (a) is computed by solving the upper triangular system R~ b"=dl, where Ov=[dl] n d2 T--n ~ and setting ~ - - S b . REMARK. This step corresponds to the least squares solution of the linear problem IIx (a) b-v (a)l[' for fixed a. STEP 5. Compute W=O [DX b+D]]= W2 t-'n" REMARK. The explicit computation of W~ is not needed unless one wants to evaluate the estimated variance-covariance matrix of the parameters (for details see [5]). with algorithms [or separable nonlinear least squares problems 321 STEP 4. Perform an iteration according to (4') or (5'). Using the Householder transformations again, obtain ~ R2 l"~, ' W2=Q" f 0 where (~ is (T--n)X(T--n) orrhogonal, R2 is m • m • m permutation matrix. upper triangular, S is a Then solve the upper triangular system R2 h ' = - - ~ , where d~ ]/'--n--m and put h--S h. This is the correction term for GN. For the GNM iteration it is necessary to complete the decomposition of the augmented jacobian (7) [ W2 ] according to the scheme (8) w:=_'73 o vI 0 1o 1~,-~,' T--n where Q is (T--n+m) • (T--n+m) orthogonal. Then solve the upper triangular system R2 h------d~, and set h=ffh.. REMARK 1. The two stage orthogonal decomposition (8) - originally described by Jennings and Osborne [6] - requires less computational effort than the standard one stage factorization. Moreover, it permits the calculation of the Moore-Penrose inverse of the matrix (7) for different values of v without complete refactorization of the matrix itself. Note that pivoting is not needed for 322 C. CORRAm- L. STEFANINI: Computational experience the second stage of (8), since as a result of the first stage the euclidean norms of the columns of the augmented jacobian turn out to be in descending order. REMARK 2. For the actual implementation of the generalized GN and of GNM aIgorithm we need to compute a) the steplength ), and b) the damping factor v respectively. Convenient schemes are the following. a) Set ~,=1 (full step GN). Then if I[r2(a-)'h)[[ 2> _ ][r2(a)[[2 put ), <---DECR.y and repeat the test; otherwise go to the next step. b) Put v = t/~N, where N denotes the maximum norm of r2 (a) corresponding to the current value of a, i. e. N = [[r2 (a)lt., and : 10 if c--" I 1 if .01 if 10<N 1 < N < 10 N__<1 (see [2]). Then if IIr2 ( a - h ) / / 2 >- Ilr2 (a)l/2 put v <--EXP.v, perform the second stage of (8) to get a new value of h, and repeat the test; otherwise go to the next step. Suggested values for DECR and EXP are 0.5 and 1.5 respectively [10]. STEP 5. Update the value of a according to (4') or (5') and if the convergence criterion is not satisfied go to Step 2. ii) Full GN and GNM. STEP 1. Choose initial guesses for a and for b. STEP 2. Compute D [X (a) b + ! (a)] = [X (a) D (X ( a ) ) b + D (f (a))] = l . Factorize ,oIRl 0 S', where O is T • T orthogonal, R is ( r e + n ) X (m-t-n) upper triangular, and S is a (m + n) X (m + n) permutation matrix. a (m + n) • (m + n) permutation matrix. Solve the system R h'=cl and set h=S'h, where Oy=[Cl]m'l-n C2 ]T--m--n with algorithms lor separable nonlinear least squares problems 323 Then update the current values of the parameters according to GN: or GNM iteration, after completing the factorization of the augmented jacobian: R 0 =(~' ff~ +", c2 0 C2 - (Q is ( T + m + n ) • _ C3 orthogonal), and solving the upper triangular system R h ~ c ~ , whence h = S "h, and STEP 3. Test convergence, and go to Step 2 if the convergence criterion is not met. Further details as well as FORTRAN listings are given in [5], [4]. It is customary to compare different iterative procedures in terms of the number of <<multiplications ;> needed to perfom one iteration. Let the number of multiplications for the unrestricted resp. restricted GN (GNM) be N resp. Nr (N' resp. Nr'); then a simple but tedious calculation shows that N - - N r = n T + 3mn>O, N'--N/=N--Nr+n (7m+ 7 / 2 n + 25/2)>O, i. e. the restricted algorithms allow a slight saving of computation for any combination of the parameters T, m, n. It may be remarked that this feature has not been pointed out in the existing literature. To summarize, the restricted algorithms seem to offer a-priori two main advantages over the unrestricted ones: i) reduction in dimension of the parameter space - a potentially valuable feature, especially when poor starting guesses are available; and ii) slightly reduced computational cost per iteration. A comparative investigation of the effectiveness of the above algorithms in treating a variety of test problems is given in the next Section. C. Com~AOt - L. STEFANINI; Computational experience 324 3. Experimental results. Some comparisons of the effectiveness of the algorithms outlined in the previous Sectio.n have been given in the literature [5], [7], [8]. However, examination of the test results presented so far can only give a fragmentary picture of the relative performances of the restricted and unrestricted algorithms because each study used different test examples which were run on different computers. Moreover, Kaufman's algorithm, which seems to be the most efficient, has been tested by the Author on two examples only. We feel that a more desirable state of affairs would be to conduct an evaltmtion using a uniform set of standards and test problems. Our experiments consists of two test series, We made one series of tests on six problems which have been used by various authors of nonlinear least squares algorithms. Some of them have assumed the role of , classics >~, being repetitively used to compare the performances of algorithms. The list of the test problems is given below. In what follows the term ~ optimal values ~ refers to those estimates of the parameters that have given the least known value of the sum of squares of the residuals. EXAMPLE 1, [2]. Model: b [exp (-- 1 0 x ) - exp ( - x ) ] + exp ( - a ~ x ) - exp (--az x); here T--10, n = l , m = 2 . Initial values: t[r112=0.3944; a) a=(0,20)', b=1.745, b) a=(0,10)', b=1.660, ]]rl12=7.984E-2. Optimal values: al*--a2*, b * = 0 ; a l * = l , az*---10, b * = l ; a1*=10, a2*=l, b * = - - l ; IIr*ll~=0. EXAMPLE 2, [8]. Model: bl {exp [q (xl--x2 a,) a2] - 1 }/x2+ b2 {exp [q (xl--x2 al) a3] - 1 }/x2+ +b~ (xl-x~ a,)/x2; here T = 1 9 , n--3, m = 3 . with algorithms/or separable nonlinear least squares problems 325 Initial values: a) a=(2.055, 0.4721, 1)', 11r]12=6.204 E - 3; b) a=(2.055, 0.05,1)', b--(4.418E--8, 3.523E--12, 3.265E--5)', b=(8.894e-5, 4.866E--12, --1.612E--4)', 11r[12--o.77o8. Optimal values: a*=(1.673, 0.4272, o.917)', 1[r'112-4.475E--3. b*=(2.009 E--11, 7.433 E--8, 3.156 E--5)', EXAMPLE 3, [10]. Model: b,+b2 exp (--al x)+b2 exp (-a2 x); here T=33, n = 3 , m=2. Initial values: a=(0.01, 0.02)', b=(0.3266, 1.521, -0.9729)', [ir*li~=4.918E-3. Optimal values: a*=(0.01287, 0.02212)', b*=(0.3754, 1.935, -1.464)', 11:112=5.465E - 5 . EXAMPLE 4, [10]. Model: bl exp ( - a l x) +b2 exp [ - a z (x--a3) z] -q-b3exp [--a4 (x--as) 2] + + b4 exp [ - a6 (x-- aT)2] ; here T--65, n = 4 , m--7. Initial values: a = (0.6, 3, 2, 5, 4.5, 7, 5.5)', b--(1.313, 0.3342, 0.7262, 0.8198)', 1]r112=1.289. Optimal values: a*=(0.7540, 0.9052, 2.398, 1.365, 4.568, 4.824, 5.675)', b*--(1.309, 0.4315, 0.6336, 0.5993)', EXAMPLE 5, [9]. Model: b al xl/(1 +a~ x~+a2 xg; here T = 5 , n = l , m--2. t[r'112--0.04014. 326 C. CORRADI- L. STEFANINI: Computational experience Initial values: a = (10.39, 48.83)', b = 0.5058, 11r112=o.01378. Optimal values: a*=(3.151, 15.16)', b* =0.7800, I1r*112=4.355 E - 5 , EXAMPLE 6, [9]. Model: b~+b2 exp (a x); here T = 1 0 , n = 2 , m--1. Initial values: a=0.5, b=(17.35, 1.967E--9)', 11r112=1.978. Optimal values: a*=o.02, b*=(15.50, 12)', Ilr*ll2=l.E-a2. We point out that the initial values of the linear parameters - required by the full G N and G N M - were chosen according to (2), i. e. given ao, we put b0=X § (ao)v (ao). (The initial values of the nonline~,r parameters are taken from the references.) Using this trick, which is recommended in [5] even for standard nonlinear least squares procedures, the unrestricted G N and G N M give significantly better results than those reported in the literature for the same problems. Table 1 presents the results of fitting the above models with the data values reported in the mentioned references. Here we have denoted by NI, NF the number of iterations and the number of function evaluations to convergence. More precisely, N I is to be understood as the number of gradient calls, while N F denotes the total number of times the objective function was calculated. The termination criterion was set at four significant digits in the parameter estimates; the iteration cutoff was set at 50 due to computer time limitations, and was completely arbitrary. Also tabulated is the obtained sum of squared residuals. All computational work was performed in single precision arithmetic on a CDC 6600 computer. In order to eliminate the coding aspect, we have used essentially the same routines for the unrestricted and the corresponding restricted version of the algorithms. The above results are typical of the relative performances of the algorithms. In fact, in all problems that have been tested, the restricted meth,ods either behaved at least as well as the corresponding unrestricted methods - even using the mentioned <~optimal >>choice of the starting values for the linear parameters or succeeded where G N or G N M had failed to obtain the optimal values. A with algorithms /or separable nonlinear least squares problems 327 closer inspection of Table 1 indicates that the restricted G N slightly outperformed the restricted G N M , and both were quite superior to the unrestricted methods. It is remarkable that our test with the restricted G N M for problem 2 showed convergence in fewer iterations than reported in [8]. The same remark applies to our results for problems 3 and 4, as compared to those given by Kaufman. Finally, we point out that computational results with our version of standard G N M showed faster convergence than reported in [10] for problems 4 and 5 even using the same starting guesses as provided by Osborne. For further comparisons the reader can refer to the mentioned references. In ~h.e second test series we tested the algorithms on approximation of artificially generated data in the form [12] y t = exp (--0.1 xt)-- exp (--2 xt)+e.~Tt, t = l . . . . . 25, where ~Tt are taken from a sequence of normal pseudo-random numbers with mean 0 and standard deviation 1. By varying the value of e we can simulate a range of possible sets of data values, and this permits to test the robustness of the( algorithms. The results of some representative runs are listed in Table 2. All runs were started at (0.2, 1.8). It will be seen that the methods behaved identical when e is reasonably small, while the restricted G N algorithm yielded a significant improvement in iterations for quite large perturbations. In addition to the test examples described above, the restricted algorithms have been used succesfully in the solution of several <<real world ~ problems arising in various application fields such as econometrics (e. g. maximum likelihood estimation of the linear expenditure system) and physical chemistry (e. g. vapourpressure models in the vapour-liquid equilibrium of pure substances) showing the same qualitative behaviour as reported in this Section. 4. Conclusions. Numerical comparisons are notoriously difficult to generalize, involving as they do factors such as choice of test function, starting position, convergence criterion, steplength or damping factor, as well as the computer software and hardware characteristics. However, from the work accomplished thus far the conclusion can be drawn that the linear-nonlinear approach described in Section 2 yields a remarkable improvement over the standard all-nonlinear methods. In fact the results we have seen indicate that the restricted algorithms are less sensitive to the choice of the starting guesses, require fewer iterations to converge, and are more reliable than the traditional G N and G N M schemes even in perturbed cases. 328 C. CORaAOI - L. STEFANINI: I Computationalexperience I I I I I {} {:B I~ I~ lIB I'~ I~ -~ ~. ~- ,~ ~. ~. {~. ,~ ~ ~ ~ ~ BI I A ~ .2 ~ ~ ~. ~ -2 ~. A ~. I I I l l l l l I o I I I I O I o uB -o O ~ 0 ~ ~ A -~. ~ ~ ~ O Z with algorithms ]or separable nonlinear least squares problems ~o / qD I I I ! "U, I o I ~ I ~' r..- r...- 0 -,-I. R .< ei 329 330 C. CORRADI - L. STEFANINI: Computational experience REFERENCES [1] R. H. BARrtAM, W. DRANE, An algorithm ]or least squares estimation o/ nonlinear parameters when some o/ the parameters are linear, Technometrics 14 (1972), 757-766. [2] K. M. BBAWN, I. E. DENNIS, Derivative ]ree analogues of the Levenberg-Marquardt and Gauss algorithms /or nonlinear least squares approximation, Numer. Math. 18 (1972), 289-297. [3] C. Couaam, L. STEFAmm, Un atgoritmo per la soluzione di problemi di minimi quadrati non lineari, C . N . E . N . tech. rept. RT/EDP (76)5, (1976). 1"4] L. GARDIm, L. STEFAmm, Procedure di minimi quadrati Iineari e non Iineari basate su trasformazioni ortogonali, SOGESTA internal report A-253, 1976. [5] G. H. Goeun, V. PERU--, The differentiation of pseudo-inverses and nonlinear least squares pr blems whose variables separate, SIAM J. Number. Anal. I0 (1975), 413-432. [6] L. S. JENmNCS, M. R. OSBORNE, Applications of orthogonal matrix trans]ormations to the solution of systems oJ linear and nonlinear equations, Tech. Rept. 37, Computer Centre, Australian National University, (1970). [7] L. KAUFMAN, A variable projection method /or solving separable nonlinear least squares problems, BIT 15 (1975), 49-57. [8] F. T. KROGH, Efficient implementation o/ a variable projection algorithm /or nonlinear least squares problems, Comm. ACM 17 (1974), 167-169. [9] R. R. MEYER, P. M. ROTH, Modified damped least squares: an algorithm /or nonlinear estimation, I. Inst. Math. Appl. 9 (1972), 218-233. [10] M. R. OSBORNE, Some aspects o] nonlinear least squares calculations, in <<Numerical methods for nonlinear optimization>~, edited by F. A. Lootsma (1972), Acad. Press, London and New York, 171-189. ['11] M. R. OSBORNE, Some special nonlinear least squares problems, SIAM J. Numer. Anal. 12 (1975), 571-591. [12] A. RUNE, P. A. WEDIN, Algorithms /or separable nonlinear least squares problems, Stanford Computer Science Tech. Rept. 434, (1974). View publication stats