Merging Trust-Region and Limited Memory Technologies for Large-Scale Nonlinear Optimization James Burke Andreas Weigmann Xu Liang James Burke Andreas Weigmann Xu Liang () April 7, 2008 TR-L-BFGS 4/7/2008 1 / 33 Outline 1. Introduction. • General format of large-scale nonlinear optimization problem. • Main difficulties. 2. Large-scale unconstrained problem. • Description of the algorithm. • Numerical Algebra. • Numerical Results. 3. Active-set Trust-region Algorithm(ASTRAL) for bounded constrained problems. • Description of the algorithm. • Numerical Algebra. • Numerical Results. James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 2 / 33 Problem Format min f (x) s.t. l ≤x ≤u Assumptions: 1. f : Rn → R is smooth, but difficult to evaluate. high dimensional integration simulation systems of PDE/ODE’s multiple levels of optimization 2. −∞ ≤ li ≤ ui ≤ ∞ 3. n is large. (n ≥ 1000, e.g. n = 219 for imaging applications). James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 3 / 33 Computational Costs (1) Dominant Costs: function and gradient evaluations. Jorge Moré and Nick Gould No test sets now available for hard to evaluate functions. (2) Numerical linear algebra. A very significant concern due to dimensionality. But, due to expensive functions, we want to exploit all data as fully as possible. James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 4 / 33 Computational Costs (1) Dominant Costs: function and gradient evaluations. line-search vs trust-region (2) Numerical linear algebra. matrix multiplies vs equation solves James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 5 / 33 The Unconstrained Problem Objective: min f (x) Algorithm: x k +1 = x k + sk Notation: g k = ∇f (x k ), Bk ≈ ∇2 f (x k )−1 , Hk ≈ ∇2 f (x k ) • Line-Search: sk = −λk Bk d k λk a stepsize (weak or strong Wolfe conditions). Requires repeated function and gradient evaluations. • Trust-Region: sk solves James Burke Andreas Weigmann Xu Liang () min (g k )T s + 21 sT Hk s s.t. ksk ≤ ∆k TR-L-BFGS 4/7/2008 6 / 33 Hessian Approximations Scalar Secant Equation: Hk = ∇f (x k ) − ∇f (x k −1 ) x k − x k −1 Matrix Secant Equation: Hk (x k − x k −1 ) = ∇f (x k ) − ∇f (x k −1 ) n linear equations in James Burke Andreas Weigmann Xu Liang () n(n+1) 2 unknowns TR-L-BFGS 4/7/2008 7 / 33 BFGS Update: H0 sym. and positive definite ⇒ Hk sym. and positive definite T Hk +1 = Hk − Hk sk sk Hk s kT Hk sk + ykyk s kT yk T = Hk − [y k Hk sk ] T −sk y k 0 T whenever sk y k > 0 0 T k s Hk sk !−1 [y k Hk sk ]T where sk = x k +1 − x k James Burke Andreas Weigmann Xu Liang () and y k = g k +1 − g k . TR-L-BFGS 4/7/2008 8 / 33 Compact m-Step Representation Nocedal, Byrd and Schnabel (1994) −D L Hk +m = Hk − [Y Hk S] LT T s k Hk s k !−1 [Y Hk S]T where S = [sk , · · · , sk +m−1 ], Y = [y k , · · · , y k +m−1 ], and ST Y = L + D + R with L strictly lower triangular, D diagonal, and R strictly upper triangular. James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 9 / 33 Limited Memory BFGS Updating H = λI − ΨΓ−1 ΨT . where T λ= yk yk T sk y , Ψ = [Y , λS], Γ= −D LT L λS T S , 2m×2m ST Y = L + D + R S = [sk1 , · · · , skm ], Y = [y k1 , · · · , y km ] Typically, m = 5. λ-scaling – an approximate Rayleigh quotient Oren-Spedicato (1976), Phua-Shanno (1980), Barzilai-Borwein (1988). James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 10 / 33 The Trust-Region Subproblem min (g k )T s + 12 sT Hk s s.t. ksk ≤ ∆k Apply Newton’s method to find µ > 0 so that φ(µ) = 0 where φ(µ) = 1 1 − ∆k ks(µ)k µ+ = µ − φ(µ) , φ0 (µ) and s(µ) = −(µI + Hk )−1 g k . where φ0 (µ) = − g T (µI + H)−3 g ks(µ)k3 . (Our implementation avoids the hard case.) James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 11 / 33 Powers of (µI + H)−1 (µI + H)−1 = 1 [I + Ψ(τ Γ − ΨT Ψ)−1 ΨT ] τ (µI + H)−2 = 1 [I + Ψ(τ Γ − ΨT Ψ)−1 ΨT + τ Ψ(τ Γ − ΨT Ψ)−1 Γ(τ Γ − ΨT Ψ)−1 ΨT ], τ2 where τ = µ + λ. James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 12 / 33 Triangular Factorization of τ Γ − ΨT Ψ Set D̂ = (µ + λ)D + Y T Y , Ŵ = λµS T S, and L̂ = µL − λ(R + D). Compute Cholesky factorizations = M MT = J JT D̂ Ŵ + L̂D̂ −1 L̂T Then T τΓ − Ψ Ψ = M −L̂M −T 0 J −M T 0 M −1 L̂T JT . The trust-region Newton iteration occurs entirely in dimension 2m. Cost ≈ O(mn) assuming m3 << n. James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 13 / 33 Numerical Results TEST SET: 24 problems from MINPACK-2 set, 2500 ≤ n ≤ 160, 000. Termination Criteria: fbest ∼ best known function value. 1. |f k − fbest |/ max(1, |fbest |) ≤ . 2. nf ≥ 1000. James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 14 / 33 Performance Profile: (Dolen and Moré 2001) Given a problem set P, ti,sj = ri,sj = nfg(CPU time) of solving prob i by alg sj ti,sj minj ti,sj . Set ρsj (τ ) = James Burke Andreas Weigmann Xu Liang () 1 |{i ∈ P : ri,sj ≤ τ }|. np TR-L-BFGS 4/7/2008 15 / 33 Comparison of number of the function and gradient evaluations nfg 1 0.9 0.8 0.7 P 0.6 0.5 0.4 0.3 0.2 0.1 TR LineSearch 0 0 0.5 1 1.5 2 2.5 tau (a) relative accuracy 10−5 James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 16 / 33 Comparison of number of the function and gradient evaluations nfg 1 0.9 0.8 0.8 0.7 0.7 0.6 0.6 P P nfg 1 0.9 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 TR LineSearch 0 0 0.5 1 1.5 2 TR LineSearch 0 2.5 tau 0.5 1 1.5 2 2.5 tau (c) relative accuracy 10−5 James Burke Andreas Weigmann Xu Liang () 0 (d) relative accuracy 10−3 TR-L-BFGS 4/7/2008 16 / 33 Comparison of CPU time cpu 1 0.9 0.8 0.8 0.7 0.7 0.6 0.6 P P cpu 1 0.9 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 TR LineSearch 0 0 0.5 1 1.5 2 TR LineSearch 0 2.5 tau 0.5 1 1.5 2 2.5 tau (e) relative accuracy 10−5 James Burke Andreas Weigmann Xu Liang () 0 (f) relative accuracy 10−3 TR-L-BFGS 4/7/2008 17 / 33 More Implementation Details Initialization: x 0 , g 0 = ∇f (x 0 ), H0 0, 0 < κ < 1, 0 < σ < 1. Iteration: 1. Set s̄ = −Hk−1 g k 2. WHILE and r (s̄) = f (x k +s̄)−f (x k ) . q(s̄) (98% acceptance) r (s̄) < κ a. Let δ = σ ks̄k. b. Solve c. Compute r (s̄) = min q(s) = sT g k + 12 sT Hk s s.t. ||s|| ≤ δ. f (x k +s̄)−f (x k ) . q(s̄) END WHILE 3. Set x k +1 = x k + s̄. Update Hk +1 . James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 18 / 33 Minimization with bounded constraints (P) min f (x) l ≤ x ≤ u. Active-set Trust-region Algorithm (ASTRAL). We use an `∞ trust-region to conform with the constraint geometry. Ω = {x | l ≤ x ≤ u } , and B∞ = {x | kxk∞ ≤ 1 } o n Ωk = Ω ∩ (xk + ∆k B∞ ) = x l k ≤ x ≤ u k . James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 19 / 33 Binding Constraints Active Constraints A(x) = {i | xi = li or xi = ui } Binding Constraints B(x) = i xi = li and (∇f (x))i > 0, or xi = ui and (∇f (x))i < 0 Non-Binding Constraints B c (x) = {1, 2, . . . , n} \ B(x), ν(x) = |B c (x)|. Φ(x) is an n × ν(x) matrix whose columns are those of the identity matrix corresponding to the non-binding constraints B c (x). James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 20 / 33 Active-set Trust-region Algorithm 1. Identify B(x k ) and set Φk = Φ(x k ). 2. Set H̃k = ΦTk Hk Φk , g̃ k = ΦTk g k , l̃ k = ΦTk l k , and ũ k = ΦTk u k . 3. Solve trust-region subproblem 1 T 2 s H̃k s min + g̃ k s subject to l̃ k ≤ s ≤ ũ k 4. Form the ratio r = f (x k )−f (x k +s̄) qk (s̄) James Burke Andreas Weigmann Xu Liang () and update. TR-L-BFGS 4/7/2008 21 / 33 Trust Region Sub-Probroblem Reduction Let w = s − l̃ k , h = ũ k − l̃ k . min 1 T 2 s H̃k s s.t. l̃ k ≤ s ≤ ũ k + g̃ k s min 1 T 2 w H̃w s.t. 0 ≤ w ≤ h. ≡ + kT w . Since H̃ = ΦT HΦ is positive definite, this QP is convex. We solve using an interior point algorithm. James Burke Andreas Weigmann Xu Liang () W = diag (w) TR-L-BFGS 4/7/2008 22 / 33 Interior Point Newton Equations p = z − v − k − t U −1 e + U −1 V h + t W −1 e r ∆w = (H̃ + U −1 V + W −1 Z )−1 p, = −w + r ∆u = −w + h − u − ∆w, ∆v = t U −1 e − v − U −1 V ∆s, ∆z = −W −1 Z ∆w + t W −1 e − z. t = homotopy parameter. James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 23 / 33 (H̃ + U −1 V + W −1 Z )−1 H̃ = ΦT HΦ = ΦT (λI − ΨΓ−1 ΨT )Φ (H̃ + U −1 V + W −1 Z )−1 = i−1 G−1 + G−1 (ΦT Ψ) Γ − (ΦT Ψ)T G−1 (ΦT Ψ) (ΦT Ψ)T G−1 h where G = λI + U −1 V + W −1 Z . James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 24 / 33 Triangular Factorization Write " Γ− (ΦTk Ψ)T G−1 (ΦTk Ψ) = −D̂ L̂T L̂ Ŵ # . Compute Cholesky factors D̂ = MM T and Ŵ + L̂D̂ −1 L̂T = JJ T . Then Γ − (ΦTk Ψ)T G−1 (ΦTk Ψ) = James Burke Andreas Weigmann Xu Liang () M −L̂M −T TR-L-BFGS 0 J −M T 0 M −1 L̂T JT 4/7/2008 . 25 / 33 Numerical Results TEST SET: 23 problems from CUTEr set. n ≥ 1000. 21 problems have dimension ≥ 10000. Termination Criteria: fbest = best known function value. 1. |f k − fbest |/ max(1, |fbest |) ≤ . 2. nf ≥ 1000. James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 26 / 33 Comparison of nfg between L-BFGS-B and ASTRAL 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 P P L-BFGS-B: Nocedal-Zhu-Byrd-Liu (1997) 0.5 0.4 0.5 0.4 0.3 0.3 0.2 0.2 0.1 0.1 ASTRAL LBFGSB 0 0 0.5 1 1.5 2 2.5 3 3.5 4 ASTRAL LBFGSB 0 0 tau 1 2 3 4 5 6 tau Figure 1a Figure 1b Figure 1a. Performance profiles, sum of the function and gradient evaluations, relative accuracy 10−5 . Figure 1b. Performance profiles, sum of the function and gradient evaluations, relative accuracy 10−3 . James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 27 / 33 Comparison of CPU time Note that the average cost of each function evaluation in our test set is 0.002s, and the average cost of each gradient evaluation is 0.008s. James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 28 / 33 Comparison of CPU time Note that the average cost of each function evaluation in our test set is 0.002s, and the average cost of each gradient evaluation is 0.008s. 1 0.9 0.8 0.7 P 0.6 0.5 0.4 0.3 0.2 0.1 ASTRAL LBFGSB 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 tau James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 28 / 33 1 0.9 0.8 0.8 0.7 0.7 0.6 0.6 P P 1 0.9 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 ASTRAL LBFGSB 0 0 0.5 1 1.5 2 2.5 3 3.5 4 ASTRAL LBFGSB 0 4.5 tau 0.5 1 1.5 2 2.5 3 3.5 4 tau cpu(f) = 0.02s, cpu(g) = 0.08s James Burke Andreas Weigmann Xu Liang () 0 TR-L-BFGS cpu(f) = 0.04s, cpu(g) = 0.16s 4/7/2008 29 / 33 James Burke Andreas Weigmann Xu Liang () Thank You TR-L-BFGS 4/7/2008 30 / 33 Reference. 1. Limited Memory BFGS Updating in a Trust–Region Framework for Unconstrained Optimization, with James V. Burke, and Andreas Wiegmann. 2. ASTRAL: An Active Set l∞ -Trust-Region Algorithm for Box Constrained Optimization, with James V. Burke. James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 31 / 33 Literature Review: Hager and Zhang(2006) Zhang and Hager(2004) Gilbert and Nocedal(1992) Liu and Nocedal(1989) Nash(1984) James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 32 / 33 Previous Work • Hager and Zhang(2006) • Birgin and Martínez(SPG, 2001, 2002) • Lin and Moré(TRON, 1999) • Coleman and Li(1994, 1996) • Byrd, Lu, Norcedal, and Zhu(l-bfgsb, 1995, 1998) •Conn, Gould, and Toint(1988) James Burke Andreas Weigmann Xu Liang () TR-L-BFGS 4/7/2008 33 / 33