Burke, James

advertisement
Merging Trust-Region and Limited Memory
Technologies for Large-Scale Nonlinear Optimization
James Burke Andreas Weigmann Xu Liang
James Burke Andreas Weigmann Xu Liang ()
April 7, 2008
TR-L-BFGS
4/7/2008
1 / 33
Outline
1. Introduction.
• General format of large-scale nonlinear optimization problem.
• Main difficulties.
2. Large-scale unconstrained problem.
• Description of the algorithm.
• Numerical Algebra.
• Numerical Results.
3. Active-set Trust-region Algorithm(ASTRAL) for bounded
constrained problems.
• Description of the algorithm.
• Numerical Algebra.
• Numerical Results.
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
2 / 33
Problem Format
min
f (x)
s.t.
l ≤x ≤u
Assumptions:
1. f : Rn → R is smooth, but difficult to evaluate.
high dimensional integration
simulation
systems of PDE/ODE’s
multiple levels of optimization
2. −∞ ≤ li ≤ ui ≤ ∞
3. n is large. (n ≥ 1000, e.g. n = 219 for imaging applications).
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
3 / 33
Computational Costs
(1) Dominant Costs: function and gradient evaluations.
Jorge Moré and Nick Gould
No test sets now available for hard to evaluate functions.
(2) Numerical linear algebra.
A very significant concern due to dimensionality.
But, due to expensive functions, we want to exploit all data as fully
as possible.
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
4 / 33
Computational Costs
(1) Dominant Costs: function and gradient evaluations.
line-search vs trust-region
(2) Numerical linear algebra.
matrix multiplies vs equation solves
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
5 / 33
The Unconstrained Problem
Objective:
min f (x)
Algorithm:
x k +1 = x k + sk
Notation:
g k = ∇f (x k ),
Bk ≈ ∇2 f (x k )−1 ,
Hk ≈ ∇2 f (x k )
• Line-Search: sk = −λk Bk d k
λk a stepsize (weak or strong Wolfe conditions).
Requires repeated function and gradient evaluations.
• Trust-Region: sk solves
James Burke Andreas Weigmann Xu Liang ()
min (g k )T s + 21 sT Hk s
s.t. ksk ≤ ∆k
TR-L-BFGS
4/7/2008
6 / 33
Hessian Approximations
Scalar Secant Equation:
Hk =
∇f (x k ) − ∇f (x k −1 )
x k − x k −1
Matrix Secant Equation:
Hk (x k − x k −1 ) = ∇f (x k ) − ∇f (x k −1 )
n linear equations in
James Burke Andreas Weigmann Xu Liang ()
n(n+1)
2
unknowns
TR-L-BFGS
4/7/2008
7 / 33
BFGS Update:
H0 sym. and positive definite ⇒ Hk sym. and positive definite
T
Hk +1 = Hk −
Hk sk sk Hk
s
kT
Hk sk
+
ykyk
s
kT
yk
T
= Hk −
[y k
Hk
sk ]
T
−sk y k
0
T
whenever sk y k > 0
0
T
k
s Hk sk
!−1
[y k Hk sk ]T
where
sk = x k +1 − x k
James Burke Andreas Weigmann Xu Liang ()
and y k = g k +1 − g k .
TR-L-BFGS
4/7/2008
8 / 33
Compact m-Step Representation
Nocedal, Byrd and Schnabel (1994)
−D
L
Hk +m = Hk − [Y Hk S]
LT
T
s k Hk s k
!−1
[Y Hk S]T
where
S = [sk , · · · , sk +m−1 ],
Y = [y k , · · · , y k +m−1 ],
and
ST Y = L + D + R
with
L strictly lower triangular,
D diagonal, and
R strictly upper triangular.
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
9 / 33
Limited Memory BFGS Updating
H = λI − ΨΓ−1 ΨT .
where
T
λ=
yk yk
T
sk y
,
Ψ = [Y , λS],
Γ=
−D
LT
L λS T S
,
2m×2m
ST Y = L + D + R
S = [sk1 , · · · , skm ],
Y = [y k1 , · · · , y km ]
Typically, m = 5.
λ-scaling – an approximate Rayleigh quotient
Oren-Spedicato (1976), Phua-Shanno (1980), Barzilai-Borwein (1988).
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
10 / 33
The Trust-Region Subproblem
min (g k )T s + 12 sT Hk s
s.t. ksk ≤ ∆k
Apply Newton’s method to find µ > 0 so that
φ(µ) = 0
where
φ(µ) =
1
1
−
∆k
ks(µ)k
µ+ = µ −
φ(µ)
,
φ0 (µ)
and
s(µ) = −(µI + Hk )−1 g k .
where φ0 (µ) = −
g T (µI + H)−3 g
ks(µ)k3
.
(Our implementation avoids the hard case.)
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
11 / 33
Powers of (µI + H)−1
(µI + H)−1 =
1
[I + Ψ(τ Γ − ΨT Ψ)−1 ΨT ]
τ
(µI + H)−2 =
1
[I + Ψ(τ Γ − ΨT Ψ)−1 ΨT + τ Ψ(τ Γ − ΨT Ψ)−1 Γ(τ Γ − ΨT Ψ)−1 ΨT ],
τ2
where τ = µ + λ.
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
12 / 33
Triangular Factorization of τ Γ − ΨT Ψ
Set
D̂ = (µ + λ)D + Y T Y ,
Ŵ = λµS T S,
and L̂ = µL − λ(R + D).
Compute Cholesky factorizations
= M MT
= J JT
D̂
Ŵ + L̂D̂ −1 L̂T
Then
T
τΓ − Ψ Ψ =
M
−L̂M −T
0
J
−M T
0
M −1 L̂T
JT
.
The trust-region Newton iteration occurs entirely in dimension 2m.
Cost ≈ O(mn) assuming m3 << n.
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
13 / 33
Numerical Results
TEST SET: 24 problems from MINPACK-2 set, 2500 ≤ n ≤ 160, 000.
Termination Criteria:
fbest ∼ best known function value.
1. |f k − fbest |/ max(1, |fbest |) ≤ .
2. nf ≥ 1000.
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
14 / 33
Performance Profile: (Dolen and Moré 2001)
Given a problem set P,
ti,sj
=
ri,sj
=
nfg(CPU time) of solving prob i by alg sj
ti,sj
minj ti,sj
.
Set
ρsj (τ ) =
James Burke Andreas Weigmann Xu Liang ()
1
|{i ∈ P : ri,sj ≤ τ }|.
np
TR-L-BFGS
4/7/2008
15 / 33
Comparison of number of the function and gradient evaluations
nfg
1
0.9
0.8
0.7
P
0.6
0.5
0.4
0.3
0.2
0.1
TR
LineSearch
0
0
0.5
1
1.5
2
2.5
tau
(a) relative accuracy 10−5
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
16 / 33
Comparison of number of the function and gradient evaluations
nfg
1
0.9
0.8
0.8
0.7
0.7
0.6
0.6
P
P
nfg
1
0.9
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
TR
LineSearch
0
0
0.5
1
1.5
2
TR
LineSearch
0
2.5
tau
0.5
1
1.5
2
2.5
tau
(c) relative accuracy 10−5
James Burke Andreas Weigmann Xu Liang ()
0
(d) relative accuracy 10−3
TR-L-BFGS
4/7/2008
16 / 33
Comparison of CPU time
cpu
1
0.9
0.8
0.8
0.7
0.7
0.6
0.6
P
P
cpu
1
0.9
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
TR
LineSearch
0
0
0.5
1
1.5
2
TR
LineSearch
0
2.5
tau
0.5
1
1.5
2
2.5
tau
(e) relative accuracy 10−5
James Burke Andreas Weigmann Xu Liang ()
0
(f) relative accuracy 10−3
TR-L-BFGS
4/7/2008
17 / 33
More Implementation Details
Initialization: x 0 , g 0 = ∇f (x 0 ), H0 0, 0 < κ < 1, 0 < σ < 1.
Iteration:
1. Set
s̄ = −Hk−1 g k
2. WHILE
and r (s̄) =
f (x k +s̄)−f (x k )
.
q(s̄)
(98% acceptance)
r (s̄) < κ
a. Let δ = σ ks̄k.
b. Solve
c. Compute r (s̄) =
min q(s) = sT g k + 12 sT Hk s
s.t.
||s|| ≤ δ.
f (x k +s̄)−f (x k )
.
q(s̄)
END WHILE
3. Set x k +1 = x k + s̄. Update Hk +1 .
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
18 / 33
Minimization with bounded constraints
(P)
min
f (x)
l ≤ x ≤ u.
Active-set Trust-region Algorithm (ASTRAL).
We use an `∞ trust-region to conform with the constraint geometry.
Ω = {x | l ≤ x ≤ u } ,
and
B∞ = {x | kxk∞ ≤ 1 }
o
n Ωk = Ω ∩ (xk + ∆k B∞ ) = x l k ≤ x ≤ u k .
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
19 / 33
Binding Constraints
Active Constraints
A(x) = {i | xi = li or xi = ui }
Binding Constraints
B(x) = i xi = li and (∇f (x))i > 0,
or xi = ui and (∇f (x))i < 0
Non-Binding Constraints
B c (x) = {1, 2, . . . , n} \ B(x),
ν(x) = |B c (x)|.
Φ(x) is an n × ν(x) matrix whose columns are those of the identity
matrix corresponding to the non-binding constraints B c (x).
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
20 / 33
Active-set Trust-region Algorithm
1. Identify B(x k ) and set Φk = Φ(x k ).
2. Set
H̃k = ΦTk Hk Φk ,
g̃ k = ΦTk g k ,
l̃ k = ΦTk l k ,
and
ũ k = ΦTk u k .
3. Solve trust-region subproblem
1 T
2 s H̃k s
min
+ g̃ k s
subject to l̃ k ≤ s ≤ ũ k
4. Form the ratio r =
f (x k )−f (x k +s̄)
qk (s̄)
James Burke Andreas Weigmann Xu Liang ()
and update.
TR-L-BFGS
4/7/2008
21 / 33
Trust Region Sub-Probroblem Reduction
Let
w = s − l̃ k , h = ũ k − l̃ k .
min
1 T
2 s H̃k s
s.t.
l̃ k ≤ s ≤ ũ k
+ g̃ k s
min
1 T
2 w H̃w
s.t.
0 ≤ w ≤ h.
≡
+ kT w
.
Since H̃ = ΦT HΦ is positive definite, this QP is convex.
We solve using an interior point algorithm.
James Burke Andreas Weigmann Xu Liang ()
W = diag (w)
TR-L-BFGS
4/7/2008
22 / 33
Interior Point Newton Equations
p = z − v − k − t U −1 e + U −1 V h + t W −1 e
r
∆w
= (H̃ + U −1 V + W −1 Z )−1 p,
= −w + r
∆u = −w + h − u − ∆w,
∆v
= t U −1 e − v − U −1 V ∆s,
∆z = −W −1 Z ∆w + t W −1 e − z.
t = homotopy parameter.
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
23 / 33
(H̃ + U −1 V + W −1 Z )−1
H̃ = ΦT HΦ = ΦT (λI − ΨΓ−1 ΨT )Φ
(H̃ + U −1 V + W −1 Z )−1
=
i−1
G−1 + G−1 (ΦT Ψ) Γ − (ΦT Ψ)T G−1 (ΦT Ψ)
(ΦT Ψ)T G−1
h
where
G = λI + U −1 V + W −1 Z .
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
24 / 33
Triangular Factorization
Write
"
Γ−
(ΦTk Ψ)T G−1 (ΦTk Ψ)
=
−D̂ L̂T
L̂
Ŵ
#
.
Compute Cholesky factors
D̂ = MM T
and Ŵ + L̂D̂ −1 L̂T = JJ T .
Then
Γ − (ΦTk Ψ)T G−1 (ΦTk Ψ) =
James Burke Andreas Weigmann Xu Liang ()
M
−L̂M −T
TR-L-BFGS
0
J
−M T
0
M −1 L̂T
JT
4/7/2008
.
25 / 33
Numerical Results
TEST SET: 23 problems from CUTEr set. n ≥ 1000.
21 problems have dimension ≥ 10000.
Termination Criteria: fbest = best known function value.
1. |f k − fbest |/ max(1, |fbest |) ≤ .
2. nf ≥ 1000.
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
26 / 33
Comparison of nfg between L-BFGS-B and ASTRAL
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
P
P
L-BFGS-B: Nocedal-Zhu-Byrd-Liu (1997)
0.5
0.4
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0.1
ASTRAL
LBFGSB
0
0
0.5
1
1.5
2
2.5
3
3.5
4
ASTRAL
LBFGSB
0
0
tau
1
2
3
4
5
6
tau
Figure 1a
Figure 1b
Figure 1a. Performance profiles, sum of the function and gradient evaluations, relative
accuracy 10−5 . Figure 1b. Performance profiles, sum of the function and gradient
evaluations, relative accuracy 10−3 .
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
27 / 33
Comparison of CPU time
Note that the average cost of each function evaluation in our test set is
0.002s, and the average cost of each gradient evaluation is 0.008s.
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
28 / 33
Comparison of CPU time
Note that the average cost of each function evaluation in our test set is
0.002s, and the average cost of each gradient evaluation is 0.008s.
1
0.9
0.8
0.7
P
0.6
0.5
0.4
0.3
0.2
0.1
ASTRAL
LBFGSB
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
tau
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
28 / 33
1
0.9
0.8
0.8
0.7
0.7
0.6
0.6
P
P
1
0.9
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
ASTRAL
LBFGSB
0
0
0.5
1
1.5
2
2.5
3
3.5
4
ASTRAL
LBFGSB
0
4.5
tau
0.5
1
1.5
2
2.5
3
3.5
4
tau
cpu(f) = 0.02s, cpu(g) = 0.08s
James Burke Andreas Weigmann Xu Liang ()
0
TR-L-BFGS
cpu(f) = 0.04s, cpu(g) = 0.16s
4/7/2008
29 / 33
James Burke Andreas Weigmann Xu Liang ()
Thank You
TR-L-BFGS
4/7/2008
30 / 33
Reference.
1. Limited Memory BFGS Updating in a Trust–Region Framework for
Unconstrained Optimization, with James V. Burke, and Andreas Wiegmann.
2. ASTRAL: An Active Set l∞ -Trust-Region Algorithm for Box Constrained
Optimization, with James V. Burke.
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
31 / 33
Literature Review:
Hager and Zhang(2006)
Zhang and Hager(2004)
Gilbert and Nocedal(1992)
Liu and Nocedal(1989)
Nash(1984)
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
32 / 33
Previous Work
• Hager and Zhang(2006)
• Birgin and Martínez(SPG, 2001, 2002)
• Lin and Moré(TRON, 1999)
• Coleman and Li(1994, 1996)
• Byrd, Lu, Norcedal, and Zhu(l-bfgsb, 1995, 1998)
•Conn, Gould, and Toint(1988)
James Burke Andreas Weigmann Xu Liang ()
TR-L-BFGS
4/7/2008
33 / 33
Download