Adjoint Broyden a la GMRES Andreas Griewank 1 2 y1 2 , Sebastian Shlenkrih , and Andrea Walther 2 Institut f ur Mathematik, HU Berlin Institut f ur Wissenshaftlihes Rehnen, TU Dresden Otober 21, 2007 Abstrat It is shown here that a ompat storage implementation of a quasi-Newton method based on the adjoint Broyden update redues in the aÆne ase exatly to the well established GMRES proedure. Generally, storage and linear algebra eort per step are small multiples of n k, where n is the number of variables and k the number of steps taken in the urrent yle. In the aÆne ase the storage is exatly (n + k) k and in the nonlinear ase the same bound an be ahieved if adjoints, i.e. transposed Jaobian-vetor produts are available. A transposed-free variant that relies exlusively on Jaobian-vetor produts (or possibly their approximation by divided dierenes) requires roughly twie the storage and turns out to be somewhat slower in our numerial experiments reported at the end. Keywords: nonlinear equations, quasi-Newton methods, adjoint based update, ompat storage, generalized minimal residual, Arnoldi proess, automati dierentiation 1 Introdution and Motivation As shown in [SGW06, GSW06, SW06℄ the adjoint Broyden method desribed below has some very nie properties, whih lead to strong theoretial onvergene properties and good experimental results. A standard objetion to low rank updating of approximate Jaobians is that their storage and manipulation involves per step O(n2 ) loations and operations, respetively, sine sparsity and other struture seems immediately lost. In the ase of unonstrained optimization this drawbak has been overome by very suessful limited memory variants [NW99℄ of the quasi-Newton method BFGS, whih in the ase of quadrati objetives and thus aÆne gradients redue to onjugate gradients, the method of hoie for positive denite linear systems. Sine GMRES has a similar status with respet to the iterative solution of nonsymmetri systems it is a natural idea to implement a nonlinear solver that redues automatially to GMRES on aÆne systems. As it turns out this is the ase for a suitable implementation of the adjoint Broyden method. The insight gained from the aÆne senario also helps us in dealing with singularities and other ontingenies in the general ase. The paper is organized as follows. In Setion 2 we desribe the adjoint Broyden sheme and its main properties. In Setion 3 we develop a ompat storage implementation with several variants depending on the derivative vetors that are available. These are all equivalent in the aÆne ase for whih we show in Setion 4 that the iterates are idential to the ones produed by GMRES, provided a linearly exat line-searh is employed. Nevertheless, Matheon Partially supported by the DFG Researh Center "Mathematis for Key Tehnologies", Berlin y Corresp. author: e-mail: griewankmathematik.hu-berlin.de, Fax: +49-30-2093-5859 1 our methods are geared towards the general, nonlinear senario, where the basi shem is guaranteed to onverge [Sh07, Se. 4.3.2℄, provided singularity of the atual Jaobian is exluded. Finally, in Setion 6 we report omparative numerial results, mostly on nonlinear problems. 2 Desription of the quasi-Newton method We onsider the iterative solution of a system of nonlinear equations F (x) = 0; assuming that F : Rn ! Rn has a Lipshitz ontinuously dierentiable Jaobian F 0 (x) 2 Rnn in some neighborhood N Rn of interest. Given an initial estimate x0 reasonably lose to some root x 2 F 1 (0) \ N and an easily invertible approximation A 1 to F 0 (x ) we may apply our algorithms to the transformed problem 0 = F~ (~x) F (x0 + A 1 1 x~) Therefore we will assume without loss of generality that the original problem has been rewritten suh that for some saling fator 0 6= 2 R A 1 = I and x0 = 0 This assumption on A 1 greatly simplies the notation without eeting the mathematial relations for any sensible algorithm. Throughout the paper we use the onvention that the subsript k labels all quantities related to the iterate xk as well as all quantities onerning the step form xk 1 to xk . Hene the basi iteration is x =x k + k sk with As 1 sk = Æk Fk 1 k 1 and k 2R3Æ k where Fk 1 F (xk 1 ). After eah iteration the Jaobian approximation Ak 1 is updated to a new version Ak in a way that distinguishes various quasi-Newton methods and is our prinipal onern in this setion. The salar Æk allows for (near) singularity of the approximate Jaobian Ak 1 and k represents the line-searh multiplier, both of whom will be disussd below. Whenever disrepanies are omputed or symbolially represented, we subtrat the (more) exat quantity from the (more) approximate quantity. This is just a notational onvention onerning the seletion of signs in forming dierenes. The Rank-one Update Our methods are based on the following update formula. Denition 1 (Adjoint Broyden update) For a given matrix A 1 2 R and a urrent point x 2 R set A = A 1 v v> A 1 F 0 with F 0 F 0 (x ) n k k where v = =k k k k k k k k 'Tangent': = A k k k k 1 s k k k with 2 R 'Residual': = F 'Seant': = A n n k F0 s k (Fk k k 2 R n f0g for some 2 R n f0g for some s k F k k hosen aording to one of the three options: 1 k n k 1 )=k k n k 2 (1) It an be easily seen that the formula represents the least hange update with respet to the Frobenius matrix norm in order to satisfy the adjoint tangent ondition A> = F 0> k k k k The residual hoie has the nie property that after the update A> F = F 0> F k k k k rf (x k ) for f (x) kF (x)k2 =2 so that the gradient of the squared residual norm is reprodued exatly. Throughout the paper k k denotes the Eulidean norm of vetors and the orresponding indued 2-norm of matries. When k is seleted aording to the tangent or seant option, the primal tangent ondition Ak sk = Fk0 sk is satised approximately in that k(A F 0 )s =ks k k k k kk = O(kx x k k 1 k) When a full quasi-Newton step sk = Ak 1 1 Fk 1 with k = 1 = Æk has been taken then the residual and the seant options are idential. The seant option redues to the tangent option as k ! 0 or when F is aÆne in the rst plae. Throughout the paper we will allow the hoie k = 0, whih amounts to a pure tangent update step on the Jaobian without any hange in the iterate xk itself. Several suh primally stationary iterations may be interpreted as part of an inexat Newton method, whih approximately solves the linearization of the given vetor funtion at the urrent primal point xk . Heredity Properties In the ase of an aÆne funtion F (x) Ax b the tangent and seant options yield identially = (A k k A)s = D 1 k k 1 s with Dk k 1 A 1 k A2R n n Then it follows from (1) that the disrepany matries Dk satisfy the reurrene D = D k k D 1 k s s> D> 1 D kD 1 s k2 1 k k k k k k 1 = (I v v> )D k k k 1 Form this projetive identity one sees immediately that the nullspaes of Dk and its transposed Dk> grow monotonially with eah update and must enompass the whole spae Rn after at most n updates that are well dened in that their denominator does not vanish. In other words in the aÆne ase the tangent and seant updates exhibit diret and adjoint hereditary in that A s = As k j j and A> j = A> j for 0 j k k When the residual update is applied intermittently without k 2 range(Dk 1 ) and thus vk 62 range(Dk 1 ) the diret heredity is maintained but adjoint heredity may be lost. Suh updates an be viewed as a reset and are expetedt to be partiularly useful in the nonlinear ase. Jaobian Initialization It is well known for unonstrained optimization by some variant of BFGS that, starting from an initial Hessian approximation of the form I the performane may be strongly dependent on the hoie of the salar 6= 0. This is so in general, even though on quadrati problems with exat line-searhes the iterates are mathematially invariant with respet to 6= 0. Hene we will also look here for a suitable initial saling. 3 Another aspet of the initializationn is that in order to agree with GMRES on aÆne problems, we have to begin with a residual update using 0 = F0 before the very rst iteration. This implies in the aÆne ase that for all subsequent residual gradients rf (xk ) = Fk> Fk0 = Fk> Ak , whih ensure for the quasi Newton-steps s k+1 rf (x = Æk Ak 1 Fk that k )> sk+1 = Fk> Fk0 sk+1 = Æk kFk k2 For Æk > 0 we have therefore desent, a property that need not hold in the nonlinear situation as we well disuss below. Starting form A 1 = I with any we obtain by the initial residual update A0 = I v0 v0> (I F00 ) with det(A0 ) = 1 v0> F00 v0 A reasonable idea for hoosing seems to minimize the Frobenius norm of the resulting update from A 1 to A0 . This riterion leads to = v0> F00 v0 , a number that may be positive or negative but unfortunately also zero. That exeptional situation arises exatly if det(A0 ) = 0 with the nullvetor being v0 irrespetive of the hoie of . In any ase we have by Cauhyn Shwartz inequality > 0 v F v0 0 0 kF 0 v k 0 0 where the right hand side does not vanish provided F00 is nonsingular as we will assume throughout. Hene we onlude that sign(v0> F00 v0 )kF00 v0 k an be used as initial saling. Should the rst omponent be zero the sign an be seleted arbitrarily from f+1; 1g. We ould be a little bit more sophistiated here and hoose the size jj as the Frobenius norm of the rst extended Hessenberg matrix H0 2 R21 generated by GMRES, but that ompliates matters somewhat in requiring some look-ahead, espeially in the nonlinear situation. Ourrene and Handling of Singularity As we have seen above the ontingeny det(Ak ) = 0 may arise theoretially already when k = 0. In pratie we are muh more likely to enounter nearly singular Ak for whih the full quasi-Newton diretions sk+1 = Ak 1 Fk beome exessively large and strongly eeted by round-o. Provided we update along a null-vetor whenever Fk 1 is not in the range of F 0 (x) we have even theoretially at most one null diretion aording to the following lemma. Lemma 2 (Rank Drop at most One) 6 0, then the tangent option If A 1 s = Æ F 1 for Æ 2 R with s 6= 0 and F 0 s = = (A 1 A)s ensures for the update (1) that 8 = 1 if Æ = 0 and F 0 s 62 range(A 1 ) > > < = 0 if Æ = 0 and F 0 s 2 range(A 1 ) rank(A ) rank(A 1 ) 2 f0; 1g if Æ 6= 0 and F 0 s 62 range(A 1 ) > > : 2 f 1; 0g if Æ 6= 0 and F 0 s 2 range(A 1 ) k k k k k k k k k k k k k k k k k k k k k k k k k k k k k Proof: The tangent update always takes the expliit form A =A k k 1 A k 1 F 0 s s> A k k k k F0 > A 1 k k 1 F 0 =k(A k k 1 F 0 )s k k k 2 If Fk0 sk 2 range(Ak 1 ) the range of Ak is ontained in that of Ak 1 so that the rank annot go up, whih implies immediately the forth ase as a rank-one update an only hange the rank by one up or down. If Fk0 sk 62 range(Ak 1 ) then multipliation of the above equation 4 from the right by a prospetive nullvetor v shows that the oeÆient of Fk0 sk and thus the whole rank one term must vanish. Hene v must already be a nullvetor of Ak 1 and thus the rank annot go down, whih implies in partiular the third ase. When Æk = 0 and thus Ak 1 sk = 0 the update simplies to Ak = Ak 1 + Fk0 sk s>k Fk0>Fk0 =kFk0 sk k2 so that sk is a nullvetor of Ak 1 but not a null-vetor of Ak . Hene we have also proven the assertion for the rst ase, as there an be no new null-vetor as observed above. In the remaining seond ase all nullvetors of Ak 1 that are orthogonal to (Fk0 sk )> Fk0 are also nullvetors of Ak and there is exatly one additional nullvetor, whih we may onstrut as follows. Let Fk0 sk = Ak 1 vk . Then there is one value 2 R suh that A (v + s ) = F 0 s 1 s> F 0 >F 0 v =kF 0 s k k k k k k k k k k k k 2 = 0 The lemma has the following algorithmi onsequenes. If A0 has at least rank n 1 and we selet sk as a nullvetor, i.e. set Æk = 0, whenever Ak 1 is singular, then the rank of the approximations Ak annever drop below n 1. We will all this approah of setting Æk = 0 as soon as Ak 1 is singular, the full rank strategy. Exatly whih value Æk 6= 0 we hoose when Ak 1 is nonsingular does not make muh dierene in the aÆne ase, but is of ourse quite important in the nonlinear ase, unless we perform an exat line-searh suh that the saling of sk beomes irrelevant. We an only deviate from the full rank strategy when the approximate Jaobian Ak 1 is singular but Fk 1 still happens to be in its range. Then we might still hoose Æk 6= 0 and determine sk as some solution to the onsistent linear system Ak 1 sk = Fk 1 Æk . This hoie of sk is even theoretially nonunique and pratially subjet to severe numerial instability, espeially in the nonlinear senario. 3 Smooth formulation via Adjugate In the aÆne situation we will see that the singularly onsistent linear systems annever our and that the resulting property rank(Ak ) n 1 is related to the well-known fat that the Hessenberg matrix Hk in the Arnoldi proess never suers a rank drop of more than one, provided the system matrix itself is nonsingular. To dene sk+1 uniquely as a smooth funtion of Ak and Fk we may set Æk+1 = det(Ak ) and use the adjugate adj (Ak ) dened as the ontinuous solution to the identity A adj (Ak ) = det(Ak )I = adj (Ak )Ak k The entries of adj (Ak ) may be dened as the o-fators of Ak Then we may dene the steps onsistently and niely bounded via s If rank(Ak ) = n k+1 ) adj (Ak )Fk Ak sk+1 = det(Ak )Fk 1 there exist nonzero nullvetors uk and wk 2R n suh that adj (Ak ) = wk u> with Ak wk = 0 and u> k 6= 0 k Ak = 0 Then the above formula yields the step s so that we have s k+1 =0 k+1 = wk u> Fk k , F k 2 kern(A k ) = 0 or 0 6= uk ? Fk = 6 0 where the seond possibility an only our when Ak is singular. The rst represents regular termination beause the system is solved, whereas the seond possibility indiates premature break down of the method if it is indeed dened in terms of the adjugate. It means that the linear system Ak sk+1 = Fk is singular but still onsistent as Fk happens to lie in the 5 range fuk g? of Ak . Hene nonzero solutions sk+1 would exist but not be unique and in the presene of round-o possibly very large. Fortunately this ontingeny an not our in the aÆne senario as we will see in Setion 5. If it does in the nonlinear ase we may dene sk+1 as some nonzero null-vetor of Ak , whih is essentially unique as long as rank(Ak ) = n 1 irrespetive of whether Fk is in its range or not. Alternatively we may reset Ak to A0 as disussed above with F0 = Fk , whih ertainly ensures that the subsequent step is welldened. The use of the adjugate is more of an aestheti devie in view of the aÆne senario that is of partiular interest in this paper. It does however alleviate the need to distinguish the ases rank(Ak ) = n and rank(Ak ) = n 1 in proofs and other developments. The numerial omputation of sk+1 = adj (Ak )Fk an be performed simply and stably on the basis of an LU- or QR fatorization of Ak . To have a better hane of obtaining a desent diretion one may multiply the step by sign(det(Ak )), whih guarantees desent aording to (2) in the aÆne ase. More realiable for the nonlinear ase would be to evaluate always the diretional derivative rf (xk )> sk+1 and if neessary swith the sign of sk+1 before entering the line-searh. Line-Searh Requirements The line-searh from [Gri86℄ skethed below makes no assumption regarding the diretional derivative and thus may produe negative step-multipliers. Moreover, if sk 6= 0 is seleted as arbitrary null-vetor of Ak whenever det(Ak ) 6= 0, then that line-searh ensures onvergene from within level sets of f in whih the atual Jaobian F 0 (x) has no singularities. That is true even if A0 is initialized to the null matrix, whih would leave a lot of indeterminay for the rst n step seletions. The least-squares alulation at the heart of the GMRES proedure may be eeted in our quasi-Newton method through an appropriate line-searh. Sine for aÆne F (x) = Ax b the funtion f~k () f (xk 1 + sk ) = kFk 1 + Ask k2 =2 is quadrati, just three values of f~k or two values and one diretional derivative will be enough to ompute the exat minimizer k 2 R. Alternatively, we may interpolate the vetor funtion itself by F~ () k )F (1 k 1 + F (xk 1 + sk ) on the basis of Fk 1 and F (xk 1 + sk ) alone. In the aÆne situation we have exatly f~k () = kF~k ()k2 =2 so that the two approahes are equivalent and yield the optimal multiplier = k F > 1 As kAs k2 k k k > > s kAAs kAs k k 1 k k 2 The multiplier k may be negative or even zero but it always renders the new residual Fk = Fk 1 + k Ask exatly orthogonal to Ask . This orthogonality is ruial to proving the equivalene with GMRES and we will all any line-searh yielding suh an k in the aÆne ase as linearly exat. Throughout we will refer to the step xk xk 1 = k sk as trivial : s = 0 ; full : = 1=Æ ; singular : det(A k k k k k 1 )=0; exat : = : k k In the nonlinear situation we may have to perform several interpolations as desribed in [Gri86℄ before an aeptable k is reahed. As we will see in the nal setion our line-searh based on vetor interpolation rarely requires more than one readjustment of k from the initial estimate k = 1=Æk . Of ourse in the aÆne ase the initial guess does not matter at all if at least one interpolation is performed so that k is reahed. 6 Algorithmi Speiation Putting the piees together we get the following algorithm Algorithm 3 (Adjoint Broyden) Initialize: Set x0 = 0 and A0 = I v0 v0> (I F00 ) with v0 = F0 =kF0 k and = sign(v0> F00 v0 )kF00 v0 k, set k = 1 Iterate: Compute s = adj (Ak 1 )Fk 1 and dene by the tangent or seant option. Terminate: If k k " return x = x 1 + s =Æ and stop Update: Inrement x = x 1 + s for some 2 R set v = =k k, update A = A 1 v v> (A 1 F 0 ) and ontinue with Iterate for k = k + 1 k k k k k k k k k k k k k k k k k k k k k The algorithm involves at eah iteration one evaluation of Fk 1 , one of vk> Fk0 a few trial values for Fk during the line-searh. In terms of linear algebra we have to ompute the step sk by solving a system in the approximated Jaobian Ak 1 and then update an appropriate representation of it to that of Ak . This means that both linear algebra subtasks require O(n2 ) operations and the storage requirement is n2 or 1:5 n2 oating point numbers for a QR and LU version, respetively. 4 Compat Storage Implemention In order to redue storage and linear algebra at least for early iterations we onsider the additive expansion A = I k X v v> A j k j j F0 : 1 j j =0 Abbreviating V k v ; v ;:::; v 2 R 0 1 k ( and Wk k+1) n F 0> v ; : : : ; F 0> v 2 R 0 0 k k ( n k +1) we obtain the following representation of Ak and its inverse. Lemma 4 (Fatorized Representation) With L 1 2 R( +1)( +1) the lower triangular part of V > V inluding its diagonal we have 1 A = I V L V W > and det(A ) = det(H ) k k k k k k k k k k k k where n k H W > V R and R V > V L 1 2 R( +1)( +1) with R being stritly upper triangular. Sherman-Morrison-Woodbury yields the inverse A 1 = I= + V H 1 V W = > k k k k k k k k k k k k k k k k if det(A ) 6= 0 and in any ase the adjugate k adj (Ak ) = det(Ak )I= + n k 1 7 Vk adj (Hk )(Vk Wk =)> Proof: For k = 1 the rst assertion holds trivially with all matries other than A vanishing ompletely. The indution from k 1 to k works as follows A = A 1 v v> A 1 F 0 = I v v> A 1 + v v> F 0 = I + I v v > A 1 I v v > I F 0 > v v> I F 0 = I + I v v > V 1 L 1 W 1 V 1 k k k = I V k 1 = I V k k k 1 k k k k ;v ;v h k k k k k k I v> V i k k k k L v> V W >: k k 1 1 L L k k k V 1 W 1 k 0 1 1 k = I k k k k 1 k k k k k V L V = I k k h k k 1 ih k k k > 1 V 1 W 1 v> I F 0 k k k v v> I > i k k F0 k k k Hene we have proven the representation of Ak provided Lk is shown to be the inverse of the upper triangular part of Vk>Vk assuming this relation holds for Lk 1 . That last part of the indution holds sine h L 11 0 > v V 1 1 k ih L v> V k k 1 k 1 k k L k 0 1 1 i = h I 0 i 0 1 so that the matrix in the middle represents indeed Lk . Assuming rst det(Hk ) 6= 0 we obtain aording to the Shermann-Morrison-Woodbury formula the inverse A k 1 = = 1h 1h I +V I L V k k W = > V k I + V L 1 V > V + W >V k k k k k k 1 L V k k k 1 V k k W =)> i W > i k k whih an obviously be rewritten in the asserted form using the matries Rk and Hk . The adjugate is obtained by multiplying both sides with det(Ak ) = det(Hk )n k 1 . Sine Lk is not needed expliitly we an implement the adjoint Broyden method storing the two n (k + 1) matries Vk ; Wk and the matrix Hk 2 R(k+1)(k+1) in fatorized or inverted form. For small k this is ertainly muh less than the usual dense LU or QR implementation of Ak . However, as k approahes n it is signiantly more even if we do not store Vk>Vk whih is only needed for the appliation of Ak itself. Limited Memory Strategy Sine we have managed to eliminate the intermediate approximations Aj from the representation of Ak and its inverse or adjugate, it is in fat quite easy to throw out or amalgamate older diretions vj and the orresponding adjoints vj>Fj0 from Vk and Wk , respetively. Then the orresponding rows and olumns of Vk>Vk and most importantly Hk disappear or are merged as well, whih amounts to rank-two orretion that is easily inorporated into the inverse or a fatorization. Hene we have the apaity to always only use a window of m omparatively reent piees of diret and adjoint seant information, a strategy that is used very suessfully in limited memory BFGS. In a rst test implementation we simply hoose a xed maximum m and (over)write vk for k > m into the [(k 1) mod m℄ + 1-th olumns of Vm . Obviously Wm and Hm are treated aordingly. As we will show below, we nd in the aÆne ase that Vk is orthogonal so that Lk = I , Rk = 0 and Hk is atually upper Heisenberg, i.e., has only one nonvanishing subdiagonal. In the limited memory variant the orthogonality of Vk is maintained but the Hessenberg property of Hk is lost. 8 Step alulation variants Using some temporary (k + 1) vetor t the atual omputation of the next quasi-Newton step sk+1 = Æk Ak 1 Fk an be broken down into the subtasks (i) Multiply t (Æk =)Wk>Fk (ii) Multiply and derement t = Vk>Fk Æk (iii) Solve Hk t = t (iv) Multiply and derement sk+1 = (Æk =)Fk V t k The most promising savings are possible in the rst step sine we have W >F k k v>F 0 F j j k j =0:::k v>F 0 F j k k j =0:::k V > F0 F k k k The approximation holds as equality exatly in the linear ase, where F 0 is onstant and thus very nearly in the smooth ase. The vetor on the right hand side represents in fat newer derivative information than then original one on the left. So we an get by without storing Wk at all, whih pretty muh halves the total storage requirement as long as k n. However, there is another ritial issue namely how we build up the matrix Hk . Its ompared to Hk 1 new k -th olumn and row are given by v>F 0 v j j k j =0:::k V >F 0 v 2 R k k k k +1 and vk>Fk0 Vk v> F 0 v k j j j =0:::k 2R k+1 : For the olumn we may simply use the approximation based on the single, urrent diretional derivative Fk0 vk . For the row we have at least three dierent hoies. Firstly, we an ompute the adjoint vk>Fk0 but do not store it for any longer. Seondly, we store all the diretional derivative Fj0 vj for j = 0 : : : k . Finally, we an relay on the near upper Hessenberg property of Hk and only ompute the last two entries vk> Fk0 1 vk 1 and vk> Fk0 vk . The third option requires virtually no extra storage other than that of Vk and Hk in Hessenberg form. In that way the whole alulation would redue almost exatly to the GMRES proedure exept for the stritly upper triangular orretion Rk , whih is theoretially zero in the linear ase. For the solution of the nonlinear test problems in Setion 6 we used the following three variants of the adjoint Broyden method: (0) original adjoint Broyden update method storing Vk , Wk , and the QR fatorization of Hk . This requires evaluation of F (xk ) and vk>F 0 (xk ) at eah iterate. (1) minimal storage implementation using only Vk , the QR fatorization of Hk , and approximating Wk> v Vk> (Fk0 v ). Requires evaluation of F (xk ), F 0 (xk )sk , and vk>F 0 (xk ). (2) forward mode based implementation using Vk , Zk = Fj0 vj j =0:::k , the QR fatorization of Hk , and approximating Wk> v Vk> (Fk0 v ), vk> Fk0 Vk vk> Zk . Requires F (xk ) and F 0 (xk )vk Of ourse, it is also possible to implement method (2) based on nite dierenes approximations to the diretional derivatives F 0 (x)v . However, preliminary numerial tests showed that onvergene of this variant is rather unreliable. For aÆne problems the Jaobian of F is onstant and hene the variants (0) to (2) yield up to round-o idential numerial results. 5 Redution to GMRES/FOM in aÆne Case For the following result we assume that the adjoint Broyden method is applied with virtually arbitrary step-multipliers k . Naturally whenever k = 0 we have to apply the tangent update, whih ould however also be approximated by a divided dierene. Now we obtain the main theorem of this paper. 9 Theorem 5 Suppose the algorithm 3 is applied in exat arithmeti with stopping tolerane " = 0 to an aÆne system F (x) = Ax b with det(A) 6= 0. Then: (i) If > 0 the iteration performs exatly the Arnoldi proess irrespetive of the hoie of 2 R. If < 0 the v and the orresponding entries in H may dier in sign. arrives at a rst for whih . k k k (ii) With k^ n the rst index suh that ^ = 0 we have x = x^ 1 + s^ =Æ^ = A 1 b. This nal step is well dened and must be taken as F^ 1 6= 0 6= s^ and Æ^ = det(A^ 1 ) 6= 0. k k k k k k k k (iii) For k < k^ all full steps x = x 1 + s =Æ with Æ = det(A 1 ) (would) lead to points that oinide with the k-th iterate of the full orthogonalization method (FOM). (iv) If (linearly) exat are used throughout the resulting iterates x oinides with those generated by GMRES. k k k k k k k k Proof: In the aÆne ase we may always use the tangent option for so that the only impat of the step size hoies on the prinipal quantities A and v appears to be via the the residuals F = Ax b. As we will see, there is in fat no suh dependene, but we an ertainly state already now that for any partiular sequene of values there must be a ertain rst k^ for whih ^ = 0. The adjoint heredity property disussed in Setion 2 implies that for k^ > k > j 0 > = s> D > s = s > 0 = 0 k k k k k k k k j k k k 1 j k so that Vk> Vk = Ik and onsequently Lk = Ik ; Rk = 0 in the representations of Ak and adj (Ak ). Assuming that F (0) = b 6= 0 and det(A) 6= 0 we nd that 1 k^ n sine no more than n othogonal diretions vk an exist Rn . Now we establish the following relations by indution on k = 1; 2; : : : ; k^ 2 K spanfb; Ab; : : : ; A bg = spanfv ; v ; : : : ; v g R (vi) F 2 K + AK = K (vii) s 2 K 3 x (v ) vk k 1 k k 1 1 k k 1 k 1 0 1 k 1 n k k All three assertions hold learly for k = 1 where the Krylov subspae K1 is just the span of F (0) = b and v0 = 0 =k0 k is seleted by the residual option = F (0) = b . To progress from k to k + 1 we note that =D k k 1 s =A k k 1 s As = F k k k 1 Æ k As k 2K k + AKk = Kk+1 whih proves (v ) sine vk is olinear to k and orthogonality proves that their span is the whole of Kk . Similarly we have F =F k k 1 + k Ask 2K k + AKk = Kk+1 whih proves (vi). From the representation of adj (Ak ) in Lemma 3 we see that by (v ) up to and inluding vk s k+1 = adj (Ak )Fk 2 Kk+1 + range(Vk ) Kk+1 whih proves (vii) as the assertion for xk+1 = xk + k sk is obvious. Sine the vj are orthogonal and span suesively the Krylov subspaes Kk they must be idential ( up to sign hanges ) to the bases generated by the Arnoldi proess. As a onsequene it is well know that eah Avi 1 2 Ki+1 is a linear ombination of the vj with j = 0 : : : i so that there k suh that is an upper Hessenberg (k + 2) (k + 1) matrix H AV = V H k k k and Vk> AVk = Hk 10 Here Hk is for any k < k^ exatly the (k + 1) (k + 1) matrix ourring in Lemma 2 and an be obtained from H k by simply leaving of the last row. It would be nie to be show that the subdiagonal elements of Hk are positive to have omplete oinidene with Arnoldi but that is not an essential property. In any ase it follows form our Lemma X in agreement k has always full olumn with [Saa03℄ that det(A) 6= 0 implies that the retangular matrix H rank (k + 1) and Hk has therefore at least the rank k . Hene the adjugates adj (Ak ) and adj (Hk ) are always nontrivial. Moreover, sine the elements in the subdiagonal of Hk are all nonzero we know that the a left nullvetor t> of Hk must have a nontrivial rst omponent k is it exists at all. Now we an proof the remaining assertions in an expliit fashion. Firstly we obtain for the step sk+1 using the fatorized representation of the adjugate from Lemma 4 and the identity Wk = AVk with Æk+1 = det(Ak ) 1 I= + V adj (Hk )Vk>(I A=) F 1 V adj (Hk )Vk> Fk where we have used that F = V V > F so that the AF term anels out. +1 = (A A)s +1 1 = F Æ +1 + AV adj (Hk )Vk> Fk 1 = Æ +1 I AV adj (Hk )Vk> F 1 = Æ +1 I AV adj (Hk )Vk> F0 1 = Æ +1 F0 + AV adj (Hk )e0 kF0 k where e0 is the rst Cartesian basis vetor. The last simpliations ome about beause F F0 2 AK belong to the null spae of the matrix in square brakets and v0 = F0 =kF0 k. Hene we see that indeed the and thus the v and A for k < k^ are ompletely indepedendent of the hoie of whih may produe an arbitrary residual in F0 + AV . Moreover it follows s Æ = = k+1 n k k n k+1 k k k k k k k k k k k k k k k k n k n k n k n k k k k k k k k k k k k k from Cramers rule that the last omponent in the vetor adj (Hk )ek is exatly the produt of the k subdiagonal elements of the Hessenberg matrix Hk , whih are well known to be positive in the Arnoldi proess. Hene this property is maintained by indution if > 0. Now let us onsider the nal situation k^ = 0. By denition the previous k for k < k^ and thus the Fk for k < k^ 1 and the orresponding subdiagonals of Hk annot vanish. Thus we must have 0 = Æk^ Fk^ 1 AVk^ 1 adj (Hk^ 1 )e0 kF0 k Sine the olums of AVk are linearly independent and adj (Hk^ 1 )e0 kF0 k annot be zero neither Fk^ 1 nor Æk^ an vanish so that the last step sk^ is neither zero nor singular. That implies that Fk^ = Fk^ 1 + Ask^ Æk^ = k^ Æk^ = 0. Generally we have after eah full step that Fk is a multiple of k , whih belongs to the orthogonal omplement of Kk . That is exatly the dening property of an FOM iterate so that we have now proven (ii) and (iii). Sine k is obtained by a line-searh minimizing kFk 1 + Ask k22 we must have exatly Fk> Ask = 0. We now proof by indution on k < k^ the dening property of GMRES namely that Fk> Asj for all 0 < j k . It does hold for k = 1 = j as we have just shown. Sine for k>1 F =F k k 1 + k Ask = k Dk 1 sk + (1 )F k k 1 = k + (1 )F k k 1 the orthogonality of Fk to all Asj for j < k follows from the indution hypothesis Fk> 1 Asj and the fat that k ?Kk 3 Asj . To illustrate the above result in an extreme situation let us onsider the ase where A or more generally AA0 1 is equal to the right shift matrix so that for any vetor u = (1 ; 2 ; : : : n 1 ; n )> 2 Rn A(1 ; 2 ; : : : n 1 ; )> = (2 ; 3 ; : : : n 11 n 1 ; 1 )> In other words A is zero exept for 1's in the subdiagonal and the (1; n) element. Sine AA> = I this yli permutation matrix is orthogonal and thus ertainly normal, whih aording to the usual linear algebra folklore suggests that GMRES should not do too badly. In fat we nd for the right hand side b = (1; 0 : : : 0; 0)> and x0 = 0 that by GMRES also xk = 0 for k = 1 : : : n 1 and only the very last, namely n-th step leads to the solution xn = x = (0; 0 : : : 0; 1)>. Moreover the vk are the Cartesian basis vetors ek and all matries Hk = Vk> AVk have the null-vetors sk+1 = ek+1 , whih means in partiular that FOM is never dened. 6 Numerial results The adjoint Broyden methods are applied to several nonlinear equation problems. The subset of nonlinear equation problems with variable dimension of the More test set [MGH81℄ is seleted. The results for these test problems should give an overview of the performane of the variants of the adjoint Broyden method. Additionally, three spei test problems are seleted to investigate the onvergene properties of the adjoint Broyden methods in more detail. For that purpose the problem dimensions and initial states are varied. The iteration is globalized by a derivative-free line searh in the range of F . This line searh was proposed in [Gri86℄ to prove global onvergene of Broyden's method and it is adapted to the adjoint Broyden's method in [Sh07, Se. 4.3.2℄. The ompat storage variants of the adjoint Broyden method are implemented in the ode abrnlq2 given as Matlab and C routine. For the onsidered test problems and the Matlab ode derivatives are evaluated by applying AD by hand. The appliation of the C ode uses the AD tool ADOL-C. As proposed in Setion 4, we onsider three variants of the algorithm. These variants are either applied to the original funtion or to the preonditioned funtion hoosing A 1 = F 0 (x0 ) or A 1 = F (x0 )> F 0 (x0 )F (x0 )=F (x0 )> F (x0 ) I . The nonlinear equation problems with salable dimension of the More test set are given in Table 1. Table 1: Nonlinear equation problems of More test set Number (21) (22) (26) (27) (28) (29) (30) (31) Name Extended Rosenbrok funtion Extended Powell singular funtion Trigonometri funtion Brown almost-linear funtion Disrete boundary value funtion Disrete integral equation funtion Broyden tridiagonal funtion Broyden banded funtion Referene [Spe75℄ [Spe75℄ [Spe75℄ [Bro69℄ [MC79℄ [MC79℄ [Bro65℄ [Bro71℄ The olumn Number represents the number of the problem in [MGH81℄. Additionally the performane of the adjoint Broyden updates is examined in more detail for three spei test problems: 12 Test funtion 1: The disrete integral equation funtion (29) in the More test set given by x = (x( ) ) =1 , F (x) = (f (x)) =1 , and i i :::n i i :::n X h f (x) = x( ) + (1 t ) t (x( ) + t + 1)3 + i i 2 i h 2 j i j j j =1 n X t i t )(x( ) + t + 1)3 : (1 j j j j =i+1 Here h = 1=(n + 1) and ti = ih. The funtion F is dierentiable and its Jaobian is dense. The default initial iterate is hosen by x0 = (ti (ti 1))i=1:::n . Test funtion 2: The extended Rosenbrok funtion (21) in the More test set given by x = (x( ) ) =1 , F (x) = (f (x)) =1 , and i i :::n i i :::n ( f (x) = i 10 x(i+1) 1 x(i 1) x2( ) i if i odd : if i even The funtion is dierentiable and its Jaobian is tridiagonal. The default initial iterate is hosen by x0 = ( 1:2; 1; 1:2; 1; : : :). Test funtion 3: A matrix X 2 R is sought as the matrix ube root for a given real diagonalizable matrix Z 2 R , i.e., d d d d X 3 = X X X = Z: (2) The eigenvalue deomposition of Z = T DT 1 yields the diagonal matrix D = diag f1 ; : : : ; d g. Denoting D1=3 = diagf11=3 ; : : : ; 1d=3 g; one obtains for X = T D1=3 T 1 the identity X 3 = T DT 1 = Z: Thus problem (2) has a solution and an be formulated as nonlinear equation problem by F (X ) = X 3 Z = 0 2 R d d with dimension n = d2 . In the implementation the matrix X is assoiated row-wise with the state vetor x = (x(i) ), where x([ k 1℄d+l) = Xk;l for k; l = 1; : : : ; d: Here we hoose Z = tridiag ( 1; 2; 1). As default initial iterate the identity matrix X0 = I 2 R is used. Note that the (i; j )-th entry of X impats all elements in the i-th row as well as the j -th olumn of X 2 . Consequently the same entry impats all elements of X 3 , whih means that the Jaobian of this test funtion F (X ) is dense and has thus d4 nonzero d d entries. Convergene results for More test set funtions To illustrate the performane of the adjoint Broyden update methods, the number of iterations needed to reah onvergene with a reasonable tolerane are ompared. Additionally, the run times required for the whole iteration proess are stated. For that purpose, the C version of the program is ompiled using g 4.1 and exeuted on a PC with AMD Athlon(tm) 64 X2 Dual Core Proessor 3800+ with 2 GHz and 512 KB ahe size. 13 The results for the higher dimensional nonlinear equation problems of the More test set with default initial iterates are displayed in Table 2. The ompat storage representation of the adjoint Broyden method is ompared to the full storage representation based on updating an LU fatorization of Ak . The update is evaluated by an algorithm of Bennett [Ben65℄. The numbers in the rst olumn refer to the number of the test problem in [MGH81℄. If not otherwise stated, these tests are performed for the dimension n = 1000 using the initial iterates as proposed in the test set. The iteration is performed up to a tolerane of tolF = 10 14 in the residual kF (xi )k2 and at most 500 iterations. Table 2: Results of More test set for default initial iterates Test problem (21) (a) (b) () (21) (a) (P1) (b) () (22) (a) (b) () (22) (a) (P1) (b) () (26)1 (a) (b) () (26)1 (a) (P1) (b) () (27)2 (a) (b) () (27)2 (a) (P1) (b) () full adj. Broy. { 15 0.36 24; 24 { 28 0.68 0; 0 14 0.36 3; 1 17 0.43 1; 0 9 3.1e-4 1; 0 237 9.4e-3 464; 464 adjoint Broyden variant (0) (1) (2) 183 190 { 0.64 0.59 177; 0 184; 0 14 20 { 0.40 0.78 24; 24 26; 26 44 44 { 0.05 0.05 9; 4 14; 7 28 28 { 0.79 1.10 0; 0 0; 0 13 14 116 0.03 0.04 0.41 3; 0 2; 0 117; 7 17 21 { 0.49 0.85 1; 0 4; 1 9 9 226 3.9e-4 4.1e-4 0.13 0; 0 0; 0 0; 0 237 276 { 0.17 0.27 464; 464 547; 545 Test problem (28) (a) (P1) (b) () (29) (a) (b) () (29) (a) (P1) (b) () (30) (a) (b) () (30) (a) (P1) (b) () (31)3 (a) (b) () (31)3 (a) (P1) (b) () full adj. Broy. 4 0.10 0; 0 8 5.39 0; 0 5 3.25 0; 0 51 1.26 2; 0 15 0.37 0; 0 55 1.42 18; 0 19 0.49 0; 0 adjoint Broyden variant (0) (1) (2) 4 4 4 0.13 0.16 0.08 0; 0 0; 0 0; 0 7 8 8 5.13 8.75 5.27 0; 0 0; 0 0; 0 5 6 6 3.98 6.86 4.18 0; 0 0; 0 0; 0 51 53 89 0.09 0.09 0.14 2; 0 1; 0 1; 0 15 15 18 0.44 0.61 0.35 0; 0 0; 0 0; 0 42 30 70 0.10 0.09 0.17 10; 0 3; 0 58; 0 19 18 36 0.54 0.72 0.68 0; 0 0; 0 1; 0 (P1) preonditioned problem with A 1 = F 0 (x0 ), (a) iteration ounts, (b) run times in seonds, () additional linesearh trials; sign hange in step multiplier, default problem dimension n = 1000 1 Initial iterate is hosen with x0 = 21 x^0 with x^0 as proposed in the test set. Otherwise no onvergene is ahieved for dimension n = 1000. 2 dimension is n = 10, tolerane is tolF = 10 3 tolerane is tolF = 10 12 12 As one an see nothing is gained by the ompat storage implementations when the initial Jaobian F 0 (x0 ) is evaluated, fatorized and then used as a preonditioner, whih is mathematially equivalent to starting the adjoint Broyden method with A0 = F 0 (x0 ). Then there is essentially no saving with regards to the linear algebra eort. However on the test problems 21 and 22 our dense implementation of full adjoint Broyden does not work at all, where as the rst two ompat storage versions work quite niely. Judging by our experiene so far the trouble of evaluating adjoint vetors, i.e. row-vetor Jaobian produts seems to pay o sine the third version based exlusively on Jaobian-vetor produts performes signiantly worse on these smooth but nonlinear problems. All three versions generate idential iterates on aÆne problems, of ourse. On test problem 3 with diagonal preonditioning the rst two ompat storage versions generate virtually the same iterates as the full storage vesion but the run-time is redued by a fator of ten, whih is not surprising sine n = 1000. Similar benets are obtained for problems 30 and 31 when the preonditioner is a multiple of the identity. Here preonditioning by the initial Jaobian redues the number of iterations but does prolong the runtime signiantly. Hene we may onlude that the ompat storage implementation is indeed quite eÆient, espeially when the overall number of steps is only a fration of the problem dimension. 14 In addition the problems of the More test set are solved for initial iterates further away from the solution. The approah of multiplying the initial iterate by a salar fator to test the performane of a method is suggested in [MGH81℄. Table 3 displays the required iterations and run times. The hoie of the dimension n and the tolerane tolF for these test problems is the same as before. Table 3: Results of More test set for distant initial iterates Test problem (21), (P1) (a) 100 0 (b) () (22) (a) 100 0 (b) () (22), (P1) (a) 100 0 (b) () (26) (a) 10 0 (b) () (26), (P1) (a) 10 0 (b) () (27)4 (a) 20 0 (b) () (27)4 , (P1) (a) 20 0 (b) () x x x x x x x full adj. Broy. 6 0.15 2; 2 { 34 0.81 0; 0 { 47 1.21 1; 0 18 5.9e-4 7; 3 58 2.1e-3 63; 44 adjoint Broyden variant (0) (1) (2) 4 12 { 0.13 0.46 2; 2 4; 4 45 65 { 0.05 0.09 9; 5 27; 12 34 34 { 0.92 1.30 0; 0 0; 0 34 24 130 0.08 0.08 0.49 18; 10 5; 2 180; 89 47 48 185 1.34 1.94 3.88 1; 0 1; 0 11; 7 18 148 { 8.4e-4 0.04 3; 1 164; 54 58 93 { 4.7e-3 0.01 40; 18 110; 92 Test problem (28), (P1) (a) 100 0 (b) () (29) (a) 100 0 (b) () (29), (P1) (a) 100 0 (b) () (30)5 (a) 100 0 (b) () (30)5 , (P1) (a) 100 0 (b) () (31)5 (a) 10 0 (b) () (31)5 , (P1) (a) 10 0 (b) () x x x x x x x full adj. Broy. 24 0.59 1; 0 20 18.08 1; 0 26 16.71 0; 0 { 56 1.37 20; 0 { 35 0.90 8; 0 adjoint Broyden variant (0) (1) (2) 24 24 { 0.69 0.96 1; 0 2; 0 16 14 38 10.57 14.87 25.17 0; 0 0; 0 9; 0 26 26 { 17.51 28.22 0; 0 0; 0 { 67 { 0.12 8; 0 57 57 { 1.62 2.28 20; 0 17; 0 { 48 { 0.16 7; 0 35 35 { 1.03 1.44 8; 0 10; 0 (P1) preonditioned problem with A 1 = F 0 (x0 ), (a) iteration ounts, (b) run times in seonds, () additional linesearh trials; sign hange in step multiplier, default problem dimension n = 1000 4 dimension is n = 10, tolerane is tolF = 10 5 tolerane is tolF = 10 12 12 From remoter initial points the dierene between Version 1 and 2 of the ompat storage implementation beomes more marked. The latter requires only about half the storage but seems to do a better job at disarding older information as desribed at the end of Setion 4. Hene it sueeds on problems 30 and 31 with diagonal preonditioning where the original method fails. Obviously, some kind of restart must be developed, espeially in view of problem 27 where the iteration ounts exeeds the dimension. In suh ases one also needs a transition from the ompat to the full storage sheme, whih is yet to be developed. Performane on speial test problems 1-3 For the spei test problem funtions, varying problem dimensions and initial iterates the Matlab version of the adjoint Broyden variants are ompared to the build-in Matlab funtion fsolve for the solution of nonlinear equations. For the test funtions 1 and 3 tolF = 10 14 and for the test funtion 2 tolF = 10 12. The maximal number of iterations allowed is again imax = 500. Here the run times for the preonditioned problems inlude the run time to evaluate and fatorize the initial Jaobian. Apparently Matlab uses some version of the Levenberg Marquardt method, whih leads to signiantly smaller iterations ounts ompared to the diagonally preonditioned adjoint Broyden method. However, the total runtimes are always signiantly larger. Presumably, beause a lot of eort goes into the dierening for Jaobian approximations. For remote initial points the preonditioning may not pay even in terms of the iteration number and ertainly with respet to the run-time. Obviously the very heap diagonal preonditiong approah is a good idea and sometimes makes the dierene between suess and failure. So we have also xed the the diagonal saler at the initial point, whereas we the ompat storage representation allows easily to readjust it repeatedly at virtually no extra ost. 15 Table 4: Results of test funtion 1 adjoint Broyden variant fsolve Test problem (0) (1) (2) default initial iterate 100 0 , varying problem dimension = 10 (a) 16 14 37 (b) 1.5e-2 1.2e-2 2.5e-2 = 10 (P1) (a) 21 21 { 12 (b) 2.3e-2 2.2e-2 0.21 = 10 (P2) (a) 22 21 { (b) 2.2e-2 2.0e-2 = 100 (a) 16 14 37 (b) 5.5e-2 5.4e-2 0.14 = 100 (P1) (a) 26 26 { 13 (b) 0.14 0.15 3.38 = 100 (P2) (a) 22 22 (b) 9.1e-2 0.10 { = 1000 (a) 16 14 38 (b) 1.48 1.72 3.41 = 1000 (P1) (a) 26 26 { 15 (b) 34.89 36.53 515 = 1000 (P2) (a) 22 22 { (b) 1.83 2.78 default problem dimension = 100, varying saling of initial iterate 10 0 (a) 8 8 12 (b) 3.3e-2 3.5e-2 4.3e-2 10 0 (P1) (a) 8 8 10 8 (b) 8.2e-2 8.6e-2 8.4e-2 1.9 10 0 (P2) (a) 9 9 10 (b) 4.2e-2 4.3e-2 4.2e-2 500 0 (a) 49 18 98 (b) 0.21 6.8e-2 0.51 500 0 (P1) (a) 75 75 { 19 (b) 0.40 0.46 4.6 500 0 (P2) (a) 32 32 { (b) 0.12 0.14 1000 0 (a) 81 20 101 (b) 0.42 7.4e-2 0.54 1000 0 (P1) (a) 121 121 { 23 (b) 0.74 0.82 5.6 1000 0 (P2) (a) 36 36 { (b) 0.15 0.17 x n n n n n n n n n n n x x x x x x x x x (P1) preonditioned problem with A 1 = F 0(x0 ), (P2) preonditioned problem with > 0 A 1 = F (x0 ) F (x0 )F (x0 )=F (x0 )> F (x0 ) I , (a) iteration ounts, (b) run times Table 5: Results of test funtion 2 adjoint Broyden variant fsolve Test problem (0) (1) default initial iterate 0 , varying problem dimension , =1 12 = 10 (a) 188 206 (b) 0.38 0.50 = 10 (P1) (a) 14 19 17 (b) 1.4e-2 1.4e-2 0.19 = 10 (P2) (a) 146 148 (b) 0.19 0.19 = 100 (a) 182 189 (b) 0.34 0.36 = 100 (P1) (a) 14 19 23 (b) 2.6e-2 3.0e-2 0.40 = 100 (P2) (a) 144 145 (b) 0.22 0.21 = 1000 (a) 183 189 (b) 2.0 1.7 = 1000 (P1) (a) 14 19 18 (b) 1.3 1.9 12.0 = 1000 (P2) (a) 144 145 (b) 1.5 1.1 default problem dimension = 100, varying saling of initial iterate 2 0 (a) 28 371 (b) 1.8e-2 3.3 2 0 (P1) (a) 5 11 25 (b) 1.8e-2 2.2e-2 0.28 2 0 (P2) (a) 181 174 (b) 0.41 0.34 10 0 (a) { 497 (b) 9.0 10 0 (P1) (a) 4 10 10 (b) 1.7e-2 2.1e-2 0.16 10 0 (P2) (a) 437 387 (b) 6.9 4.3 100 0 (a) { { (b) 100 0 (P1) (a) 4 11 13 (b) 1.7e-2 2.1e-2 0.14 100 0 (P2) (a) { { (b) x n n n n n n n n n x x x x x x x x x n tolF e n (P1) preonditioned problem with A 1 = F 0 (x0 ), (P2) preonditioned problem with > 0 A 1 = F (x0 ) F (x0 )F (x0 )=F (x0 )> F (x0 ) I , (a) iteration ounts, (b) run times in seonds, default problem dimension n = 100, default initial iterate 100x0 16 Table 6: Results of test funtion 3 Test problem default initial iterate = 100 (a) (b) = 100 (P1) (a) (b) = 100 (P2) (a) (b) = 1024 (a) (b) = 1024 (P1) (a) (b) = 1024 (P2) (a) (b) = 4900 (a) (b) = 4900 (P1) (a) (b) = 4900 (P2) (a) (b) default problem dimension 10 0 (a) (b) 10 0 (P1) (a) (b) 10 0 (P2) (a) (b) 100 0 (a) (b) 100 0 (P1) (a) (b) 100 0 (P2) (a) (b) 1000 0 (a) (b) 1000 0 (P1) (a) (b) 1000 0 (P2) (a) (b) n n n n n n n n n x x x x x x x x x adjoint Broyden variant fsolve (0) (1) (2) 0 , varying problem dimension 23 32 { 3.1e-2 5.6e-2 18 19 { 7 8.2e-2 9.3e-2 0.24 18 19 { 3.1e-2 3.8e-2 42 48 { 0.94 1.7 39 40 { 9 15.0 15.9 52.4 39 40 { 0.91 1.36 74 79 { 16.2 27.1 65 67 { 10 704 724 2772 65 67 { 14.8 21.9 = 100, varying saling of initial iterate 65 27 145 9.0e-2 4.4e-2 0.33 35 34 { 15 0.11 0.13 0.52 35 34 { 5.5e-2 6.7e-2 210 36 212 0.84 5.7e-2 0.80 49 49 { 24 0.14 0.17 0.82 49 50 { 7.3e-2 0.10 { 46 175 7.2e-2 0.57 60 60 31 0.16 0.20 { 1.08 60 60 { 9.1e-2 0.12 x n n Comparison of limited memory apprahes Finally, we report some preliminary results on the limited memory implementation skethed in Setion 4, where the olumns of Vk ; Wk and Hk are periodially overwritten one k exeeds a ertain limit m. We use a linear test problem so that there is no mathematial dierene between our various ompat storage versions and we may use the one that has almost exatly the same memory requirement as GMRES, namely it stores the m n matrix V and the m m matrix H . Figure 1 ompares Matlab's GMRES solver with restart to the limited memory implementation of the adjoint Broyden's method. For this test the 2D Poisson equation with Dirihlet boudary onditons on a square domain and ve-point disretization is solved. Although this yields a symmetri linear problem whih ould be takled by a GC method we use it here to ompare the general nonsymmetri solvers. The dimension of the test problem is n = 100 and it is solved upto a tolerane of tolF = 10 12 . Without limiting the memory GMRES and adjoint Broyden are mathematially equivalent Krylov spae methods and reah the required tolerane at the 15th iteration. By restriting the number m we destroy the Krylov subspae property and the onvergene beomes signiantly slower. As one an see from the plot in 1 GMRES(m) takes about twie as many steps as our 'periodi' adjoint Broyden version. That may be explainable by the fat that on GMRES with restart every m steps utilizes on average information in only m=2 diretions about the problem funtion, whereas adjoint Broyden uses m of them throughout. Stritly speaking this means also that as far as the linear algebra is onerned the GMRES(m) iterations ost only about half as muh, though there is probably a lot of ommon overhead, espeially if m is small. Our main onern is of ourse the number of iterations sine we assume that eah funtion evaluation is quite expensive. 17 Figure 1: Comparison of limited memory adjoint Broyden and GMRES total number of required iterations 250 Matlab’s GMRES function with restart 200 limited memory adjoint Broyden’s method 150 100 50 15 0 2 4 6 8 10 12 14 16 number of allowed inner iterations in GMRES and number of stored updates (plus initial update) in adjoint Broyden's method 7 Conlusion and Outlook In this paper we have developed several ompat storage implementations of the Adjoint Broyden method and shown that on aÆne problems they all yield idential iterates to GMRES. For that result we assumed exat line-searhes, whih is quite natural and realisti in the aÆne ase. From a numerial linear algebra point of view our treatment is somewhat unsatisfatory in that we have barely given any onsideration to issues of round-o propagation. In partiular we have not worried about the fat that applying the ompat representation of the inverse of Ak or its adjugate to the urrent residual amounts to orthogonalisation by unmodied Gram-Shmidt. From a more nonlinear point of view getting approximating Jaobians right with a ouple of digits is already a quite satisfatory ahievement so that numerial eets at level of the mahine preision or even its root are of little onern. Nevertheless it should be investigated whether one my design at an implementation for general nonlinear problems that automatially redues to the standard GMRES proedure on aÆne problems. For the nonlinear senario of greater importane are issue related to the diagonal (re)saling of the initial Jaobian and the thorny issue if and how to reset the proedure when the storage limits are reahed or older information appears to beome obsolete. For that purpose one might monitor the subdiagonal entries in the projeted Jaobian Hk or the entries in R. They must all vanish exatly in the aÆne ase and should therefore be rather small near the roots of smooth funtions. Their relative size might also allow a smarter seletion of the diretions to be disarded. 8 Aknowledgements The rst author performed his researh for his paper at the IRISA Rennes, where he greatly beneted from the hospitality and the GMRES expertise of Bernard Phlippe and his olleagues. 18 Referenes [Ben65℄ J.M. Bennett. Triangular Fators of Modied Matries. Numerishe Mathematik, 7:217{221, 1965. [Bro65℄ C. G. Broyden. A lass of methods for solving nonlinear simultaneous equations. Math. Comp., 19:577{593, 1965. [Bro69℄ K. M. Brown. A quadrati onvergent Newton-like method based upon Gaussian elimination. J. Numer. Anal., 6:560{569, 1969. [Bro71℄ C. G. Broyden. The onvergene of an algorithm for solving sparse nonlinear systems. Math. Comp., 25:285{294, 1971. [Gri86℄ A. Griewank. The "global" onvergene of Broyden-like methods with a suitable line searh. J. Austral. Math. So. Ser. B, 28:75{92, 1986. [GSW06℄ A. Griewank, S. Shlenkrih, and A. Walther. A quasi-Newton method with optimal R-order without independene assumption. MATHEON Preprint 340, 2006. Submitted to Opt. Meth. and Soft. [MC79℄ J. J. More and M. Y. Cosnard. Numerial solution of nonlinear equations. TOMS, 5:64{85, 1979. [MGH81℄ J. J. More, B. S. Garbow, and K. E. Hillstrom. Testing unonstrained optimization software. TOMS, 7:17{41, 1981. [NW99℄ J. Noedal and S. J. Wright. Numerial Optimization. Springer-Verlag, 1999. [Saa03℄ Y Saad. Iterative Methods for Sparse Linear Systems. SIAM, 2003. [Sh07℄ S. Shlenkrih. Adjoint-based Quasi-Newton Methods for Nonlinear Equations. Sierke Verlag, in press, 2007. [SGW06℄ S. Shlenkrih, A. Griewank, and A. Walther. Loal onvergene analysis of TR1 updates for solving nonlinear euations. MATHEON Preprint 337, 2006. [Spe75℄ E. Spediato. Computational experiene with quasi-Newton algorithms for minimization problems of moderately large size. Rep. CISE-N-175, Segrate (Milano), 1975. [SW06℄ S. Shlenkrih and A. Walther. Global onvergene of quasi-Newton methods based on Adjoint Tangent Rank-1 updates. TU Dresden Preprint MATH-WR02-2006, 2006. Submitted to Applied Numerial Mathematis. 19