Adaptive Barrier Strategies for Nonlinear Interior Methods Jorge Nocedal Andreas W¨achter

advertisement
Adaptive Barrier Strategies for Nonlinear Interior Methods
Jorge Nocedal∗
Andreas Wächter†
Richard A. Waltz∗
February 25, 2005
Abstract
This paper considers strategies for selecting the barrier parameter at every iteration
of an interior-point method for nonlinear programming. Numerical experiments suggest
that adaptive choices, such as Mehrotra’s probing procedure, outperform static strategies that hold the barrier parameter fixed until a barrier optimality test is satisfied. A
new adaptive strategy is proposed based on the minimization of a quality function. The
paper also proposes a globalization framework that ensures the convergence of adaptive
interior methods. The barrier update strategies proposed in this paper are applicable to a wide class of interior methods and are tested in the two distinct algorithmic
frameworks provided by the ipopt and knitro software packages.
1
Introduction
In this paper we describe interior methods for nonlinear programming that update the barrier parameter adaptively, as the iteration progresses. The goal is to design algorithms that
are scale invariant and efficient in practice, and that enjoy global convergence guarantees.
The adaptive strategies studied in this paper allow the barrier parameter to increase or
decrease at every iteration and provide an alternative to the so-called Fiacco-McCormick
approach that fixes the barrier parameter until an approximate solution of the barrier problem is computed. Our motivation for this work stems from our belief that robust interior
methods for nonlinear programming must be able to react swiftly to changes of scale in the
problem and to correct overly aggressive decreases in the barrier parameter.
Adaptive barrier update strategies are well established in interior methods for linear
and convex quadratic programming. The most popular approach of this type is Mehrotra’s
∗
Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, 602083118, USA. These authors were supported by National Science Foundation grants CCR-0219438 and DMI0422132, and Department of Energy grant DE-FG02-87ER25047-A004.
†
Department of Mathematical Sciences, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598,
USA.
1
predictor-corrector method [17]. It computes, at every iteration, a probing (affine scaling)
step that determines a target value of the barrier parameter, and then takes a primal-dual
step using this target value. A corrector step is added to better follow the trajectory to the
solution. Mehrotra’s method has proved to be very effective for linear and convex quadratic
programming, but it cannot be shown to be globally convergent. Moreover, its reliability is
heavily dependent upon an appropriate choice of the starting point.
When solving nonlinear nonconvex programming problems, much caution must be exercised to prevent the iteration from failing. Non-minimizing stationary points can attract the
iteration, and aggressive decreases in the barrier parameter can lead to failure. Our numerical experience shows that the direct extension of Mehrotra’s predictor-corrector method to
nonlinear programming does not result in a robust method. As we discuss below, the main
source of instability is the corrector step. Other adaptive barrier update strategies have
been proposed specifically for nonlinear programming [7, 10, 19, 20, 21].
In this paper we propose new strategies for updating the barrier parameter that are
effective in practice and that are supported by a global convergence analysis. To show the
generality of our techniques, we implement them in the two different algorithmic contexts
provided by the ipopt [23] and knitro [24] software packages.
The global convergence properties of interior methods for nonlinear programming have
recently received much attention [2, 7, 12, 13, 15, 19, 20, 22, 28]. Some of these studies
focus on the effects of merit functions or filters, and on regularization techniques. With the
exception of [7, 19, 20], however, these papers do not consider the numerical or theoretical
properties of adaptive barrier update techniques.
Notation. For any vector z, we denote by Z the diagonal matrix whose diagonal entries
are given by z. We let e denote the vector of ones, of appropriate dimension, that is,
e = (1, 1, · · · , 1)T .
2
Primal-Dual Nonlinear Interior Methods
The problem under consideration will be written as
min
x
s.t.
f (x)
(2.1a)
c(x) = 0
(2.1b)
x ≥ 0,
(2.1c)
where f : Rn → R and c : Rn → Rm are twice continuously differentiable functions. For
conciseness we will refer to interior-point methods for nonlinear programming as “nonlinear
interior methods.” A variety of these methods have been proposed in the last 10 years; they
differ mainly in some aspects of the step computation and in the globalization scheme. Most
of the nonlinear interior methods are related to the simple primal-dual iteration described
next, and therefore, our discussion of barrier parameter choices will be phrased in the
context of this iteration.
2
We associate with the nonlinear program (2.1) the barrier problem
min
x
ϕµ (x) ≡ f (x) − µ
n
X
ln xi
(2.2a)
i=1
s.t.
c(x) = 0,
(2.2b)
where µ > 0 is the barrier parameter. As is well known, the KKT conditions of the barrier
problem (2.2) can be written as
∇f (x) − A(x)T y − z = 0
(2.3a)
Xz − µe = 0
(2.3b)
c(x) = 0,
(2.3c)
together with
x ≥ 0,
z ≥ 0.
(2.4)
Here A(x) denotes the Jacobian matrix of the constraint functions c(x).
Applying Newton’s method to (2.3), in the variables (x, y, z), gives the primal-dual
system
 2




∇xx L −A(x)T −I
∆x
∇f (x) − A(x)T y − z
 Z
,
0
X   ∆y  = − 
Xz − µe
(2.5)
A(x)
0
0
∆z
c(x)
where L denotes the Lagrangian of the nonlinear program, that is,
L(x, y, z) = f (x) − y T c(x) − z T x.
(2.6)
After the step ∆ = (∆x, ∆y, ∆z) has been determined, we compute primal and dual
steplengths, αp and αd , and define the new iterate (x+ , y + , z + ) as
x+ = x + αp ∆x,
y + = y + αd ∆y,
z + = z + αd ∆z.
(2.7)
The steplengths are computed in two stages. First we compute
αxmax = max{α ∈ (0, 1] : x + α∆x ≥ (1 − τ )x}
max
αz
= max{α ∈ (0, 1] : z + α∆z ≥ (1 − τ )z},
(2.8a)
(2.8b)
with τ ∈ (0, 1) (e.g. τ = 0.995). Next, we perform a backtracking line search that computes
the final steplengths
αp ∈ (0, αxmax ],
αd ∈ (0, αzmax ],
(2.9)
providing sufficient decrease of a merit function or ensuring acceptability by a filter.
The other major ingredient in this simple primal-dual iteration is the procedure for
choosing the barrier parameter µ. Two types of barrier update strategies have been studied
in the literature: adaptive and static. Adaptive strategies [7, 10, 20, 21] allow changes in
the barrier parameter at every iteration, and often achieve scale invariance, but as already
mentioned they do not enjoy global convergence properties. (The analyses presented in
3
[7, 20] provide certain convergence results to stationary points, but these methods do not
promote convergence toward minimizers.)
The most important static strategy is the so-called Fiacco-McCormick approach that
fixes the barrier parameter until an approximate solution of the barrier problem is computed.
It has been employed in various nonlinear interior algorithms [1, 3, 9, 11, 23, 25, 27] and has
been implemented, for example, in the ipopt and knitro software packages. The FiaccoMcCormick strategy provides a framework for establishing global convergence [2, 22], but
suffers from important limitations. It can be very sensitive to the choice of the initial
point, the initial value of the barrier parameter and the scaling of the problem, and it
is often unable to recover when the iterates approach the boundary of the feasible region
prematurely. Our numerical experience with ipopt and knitro suggests that more dynamic
update strategies are needed to improve the robustness and speed of nonlinear interior
methods.
3
Choosing the Barrier Parameter
In this section we discuss two adaptive barrier strategies proposed in the literature and
compare them numerically with the static Fiacco-McCormick approach. These numerical
results will motivate the discussion of the following sections.
Given an iterate (x, y, z), consider an interior method that computes primal-dual search
directions by (2.5). The most common approach for choosing the barrier parameter µ
that appears on the right hand side of (2.5) is to make it proportional to the current
complementarity value, that is,
xT z
µ=σ
,
(3.1)
n
where σ > 0 is a centering parameter and n denotes the number of variables. Mehrotra’s
predictor-corrector (MPC) method [17] for linear programming determines the value of σ
using a preliminary step computation (an affine scaling step). We now describe a direct
extension of Mehrotra’s strategy to the nonlinear programming case.
First, we calculate an affine scaling step
(∆xaff , ∆y aff , ∆z aff )
by setting µ = 0 in (2.5), that is,

 2



∆xaff
∇xx L −A(x)T −I
∇f (x) − A(x)y − z
 Z
.
0
X   ∆y aff  = − 
Xz
aff
A(x)
0
0
c(x)
∆z
(3.2)
(3.3)
We then compute αxaff and αzaff to be the longest steplengths that can be taken along the
direction (3.2) before violating the non-negativity conditions (x, z) ≥ 0, with an upper
bound of 1. Explicit formulae for these values are given by (2.8) with τ = 1.
Next, we define µaff to be the value of complementarity that would be obtained by a
full step to the boundary, that is,
µaff = (x + αxaff ∆xaff )T (z + αzaff ∆z aff )/n,
4
(3.4)
and set the centering parameter to be
σ=
µaff
xT z/n
3
.
(3.5)
This heuristic choice of σ is based on experimentation with linear programming problems,
and has proved to be effective for convex quadratic programming as well. Note that when
good progress is made along the affine scaling direction, we have µaff xT z/n, so the σ
obtained from this formula is small—and conversely.
Mehrotra’s algorithm also computes a corrector step, but in this section we take the
view that the corrector is not part of the selection of the barrier parameter, and is simply a
mechanism for improving the quality of the step. In Section 7 we study the complete MPC
algorithm including the corrector step.
Other adaptive procedures of the form (3.1) have been proposed specifically for nonlinear
interior methods [7, 10, 20, 21]. The strategy employed in the loqo software package [21]
is particularly noteworthy because of its success in practice. It defines σ as
3
mini {xi zi }
1−ξ
, 2 , where ξ =
.
(3.6)
σ = 0.1 min 0.05
ξ
xT z/n
Note that ξ measures the deviation of the smallest complementarity product x i zi from the
average. When ξ = 1 (all individual products are equal to their average) we have that
σ = 0 and the algorithm takes an aggressive step. Note that the rule (3.6) always chooses
σ ≤ 0.8, so that even though the value of µ may increase from one iteration to the next, it
will never be chosen to be larger than the current complementarity value x T z/n.
Let us compare the numerical performance of these two adaptive strategies with the
static Fiacco-McCormick approach. To better measure the impact of the choice of barrier
parameter, that is, to try to distinguish it from other algorithmic features, we use both
the ipopt and knitro software packages in our tests. These codes implement significantly
different variations of the simple primal-dual iteration (2.5). Ipopt was implemented for
these tests without a line search; only the fraction to the boundary rule (2.8) was imposed
on the primal and dual steplengths. For knitro we used the Interior/Direct option (we will
refer to this version as knitro-direct henceforth), which invokes a line search approach
that is occasionally safeguarded by a trust region iteration (for example, when negative
curvature is encountered) [25].
The barrier parameter strategies tested in our experiments are as follows:
• loqo rule. The barrier parameter is chosen by (3.1) and (3.6).
• Mehrotra probing. At every iteration, the barrier parameter µ is given by (3.1) and
(3.5). Since this requires the computation of the affine scaling step (3.2), this strategy
is more expensive than the loqo rule . For knitro-direct, in the iterations in which
the safeguarding trust region algorithm is invoked the barrier parameter is computed
by the loqo rule instead of Mehrotra probing. This is done because Mehrotra probing is expensive to implement in the trust region algorithm, which uses a conjugate
gradient iteration.
5
Function Evaluations
100
90
90
80
80
70
70
Monotone
LOQO rule
Mehrotra probing
MPC
60
50
% of problems
% of problems
Iteration Count
100
40
60
50
40
30
30
20
20
10
10
0
0
0.5
1
1.5
2
2.5
3
3.5
0
4
Monotone
LOQO rule
Mehrotra probing
MPC
0
0.5
1
1.5
2
2.5
3
3.5
4
x
not more than 2x−times worse than best option
not more than 2 −times worse than best option
(a) ipopt
(b) knitro
Figure 1: Results for four barrier parameter updating strategies.
• MPC. The complete Mehrotra predictor-corrector algorithm as described in Section 7.
As in the Mehrotra probing rule, when knitro-direct falls back on the safeguarded
trust region algorithm, the barrier parameter is computed using the loqo rule for
efficiency, and no corrector step is used.
• Monotone. (Also known as the Fiacco-McCormick approach.) The barrier parameter
is fixed, and a series of primal-dual steps is computed, until the optimality conditions
for the barrier problem are satisfied to some accuracy. At this point the barrier
parameter is decreased. ipopt and knitro implement somewhat different variations
of this monotone approach; see [23, 25] for details about the initial value of µ, the
rule for decreasing µ, and the form of the barrier stop tests.
For the numerical comparison, we select all the nonlinear programming problems in the
CUTEr test set that contain at least one general inequality or bound constraint 1 . This gives
a total of 599 problems. Figure 1(a) reports the total number of iterations for ipopt, and
Figure 1(b) shows the total number of function evaluations for knitro, both comparing the
performance of the four barrier strategies. Measuring iterations and function evaluations
give similar profiles and we provide both for greater variety. All the plots in the paper use
the logarithmic performance profiles proposed by Dolan and Moré [6]. To account for the
fact that different local solutions might be approached, problems with significantly different
final objective function values for successful runs were excluded.
Note that the adaptive strategies are more robust than the monotone approach, and
that Mehrotra probing appears to be the most successful in terms of iterations or function
1
excluding those problems that seem infeasible, unbounded, or are given with initial points at which the
model functions cannot be evaluated; see [23].
6
evaluations. The complete MPC algorithm is very fast on some problems, but is not sufficiently robust. This is clear in Figure 1(a) which reports the performance of the pure MPC
strategy implemented in ipopt; it is less clear in Figure 1(b), but as mentioned above,
in the knitro implementation the MPC approach is replaced by the loqo rule (without
corrector step) when the safeguarded trust region step is invoked. The reason for the lack of
robustness of the MPC strategy will be discussed in Section 7. Motivated by these results,
we give further attention to adaptive choices of the barrier parameter.
4
Quality Functions
We now consider an approach that selects µ by approximately minimizing a quality funcT
tion. We let µ = σ xn z , where the centering parameter σ ≥ 0 is to be varied, and define
∆(σ) to be the solution of the primal-dual equations (2.5) as a function of σ. We also let
αxmax (σ), αzmax (σ) denote the steplengths satisfying the fraction to the boundary rule (2.8)
for ∆ = ∆(σ), and let
x(σ) = x + αxmax (σ)∆x(σ),
y(σ) = y + αzmax (σ)∆y(σ),
z(σ) = z + αzmax (σ)∆z(σ).
Our approach is to choose the value of σ that gives the best overall step, as measured by
the KKT conditions for the nonlinear program (2.1). For example, we could try to choose
σ so as to minimize the nonlinear quality function
qN (σ) = k∇f (x(σ)) − A(x(σ))T y(σ) − z(σ)k2 + kc(x(σ))k2
+kZ(σ)X(σ)ek2 .
(4.1)
The evaluation of qN is, however, very expensive since it requires the evaluation of problem
functions and derivatives for every value of σ. A much more affordable alternative is to use
the linear quality function
qL (σ) = (1 − αzmax (σ))2 k∇f (x) − A(x)T y − zk2 + (1 − αxmax (σ))2 kc(x)k2
+k(X + αxmax (σ)∆X(σ))(Z + αzmax (σ)∆Z(σ))ek2 ,
(4.2)
where ∆X(σ) is the diagonal matrix with ∆x(σ) on the diagonal, and similarly for ∆Z(σ).
The function qL has been obtained from (4.1) by assuming that primal and dual feasibility
(i.e. the first two terms in (4.1)) are linear functions, as is the case in linear programming.
Note that qL (σ) is not a convex function of σ because the steplengths αxmax , αzmax depend on σ
in a complicated manner. The dominant cost in the evaluation of qL lies in the computation
of the maximal steplengths αxmax , αzmax and the last term in (4.2), which requires a few vector
operations. We define the quality function using squared norms to severely penalize any
large components in the KKT error. In Section 6 we discuss the choice of norms and other
details of implementation. We have also experimented with a quadratic quality function
that is based on the KKT conditions for a quadratic program, and that like (4.2), does not
require additional function or gradient evaluations. However, it did not perform as well as
the linear quality function, for reasons that require further investigation.
7
In Section 6 we describe a procedure for approximately minimizing the scalar function
qL (σ). Before presenting our numerical results with the quality function, we consider the
issue of how to guarantee the global convergence of nonlinear interior methods that choose
the barrier parameter adaptively.
5
A Globalization Method
The adaptive strategies described in Section 3 can be seen from the numerical results in that
section to be quite robust. (We show in the next section this is also the case with the quality
function approach.) Yet, since the barrier parameter is allowed to change at every iteration
in these algorithms, there is no mechanism that enforces global convergence of the iterates.
In contrast, the monotone barrier strategy employed in the Fiacco-McCormick approach
allows us to establish global convergence results by combining two mechanisms. First, the
algorithms used to minimize a given barrier problem (2.2) use a line search or trust region
to enforce a decrease in the merit function (as in knitro) or to guarantee acceptability
by a filter (as in ipopt). This ensures that an optimality test for the barrier function is
eventually satisfied to some tolerance . Second, by repeating this minimization process for
decreasing values of µ and that converge to zero, one can establish global convergence
results [2, 8] to stationary points of the nonlinear programming problem (2.1).
We now propose a globalization scheme that monitors the performance of the iterations
in reference to an optimality measure for the nonlinear program (2.1). As long as the
adaptive primal-dual steps make sufficient progress towards the solution, the algorithm is
free to choose a new value for the barrier parameter at every iteration. We call this the
free mode. However, if the iteration fails to maintain progress, then the algorithm reverts
to a monotone mode, in which a Fiacco-McCormick strategy is applied. Here, the value of
the barrier parameter remains fixed, and a robust globalization technique (e.g., based on
a merit function or a filter) is employed to ensure progress for the corresponding barrier
problem. Once the barrier problem is approximately minimized, the barrier parameter is
decreased. The monotone mode continues until an iterate is generated that makes sufficient
progress for the original problem, at which point the free mode resumes.
The goal of our globalization scheme is to interfere with adaptive steps as little as possible
because, as already noted, they result in fairly reliable iterations. As a measurement of the
progress in the optimization of the nonlinear program (2.1), we monitor the KKT error of
the original nonlinear program,
Φ(x, y, z) = k∇f (x) − A(x)T y − zk2 + kc(x)k2 + kZXek2 .
(5.1)
We require that this measure be reduced by a factor of κ ∈ (0, 1) over at most a fixed number
of lmax iterations, as long as the algorithm is in the free mode. Note that a convergent
sequence of points (x, y, z) gives Φ(x, y, z) → 0 only if the limit point satisfies the first order
optimality conditions for the nonlinear program (2.1).
We now formally state the proposed globalization procedure.
8
Globalization Framework
Given (x0 , y0 , z0 ) with (x0 , z0 ) > 0, a constant κ ∈ (0, 1) and an integer l max ≥ 0.
Repeat
Choose a target value of the barrier parameter µk , based on any rule.
Compute the primal dual search direction from (2.5).
Determine step sizes αp ∈ (0, αxmax ] and αd ∈ (0, αzmax ].
Compute the new trial iterate (x̃k+1 , ỹk+1 , z̃k+1 ) from (2.7).
Compute the KKT error Φ̃k+1 ≡ Φ(x̃k+1 , ỹk+1 , z̃k+1 ).
Set Mk = max{Φk−l , Φk−l+1 , . . . , Φk } with l = min{k, lmax }.
If Φ̃k+1 ≤ κMk
Accept (x̃k+1 , ỹk+1 , z̃k+1 ) as the new iterate, and set Φk+1 ← Φ̃k+1 .
Set k ← k + 1 and return to the beginning of the loop.
else
Start Monotone Mode:
Starting from (x̃k+1 , ỹk+1 , z̃k+1 ), and for an initial value µ̄ of the
barrier parameter, solve a sequence of barrier problems with a
monotonically decreasing sequence of barrier parameters less than µ̄,
to obtain a new iterate (xk+1 , yk+1 , zk+1 ) such that
Φk+1 ≡ Φ(xk+1 , yk+1 , zk+1 ) ≤ κMk .
Set k ← k + 1 and resume the free mode at the beginning of the loop.
end if
End (repeat).
Since this framework ensures that the optimality measure Φ is reduced by a factor
of κ < 1 in at most every l max iterations, it is clear that Φk → 0. Consequently, all limit
points of the sequence of iterates satisfy the first order optimality conditions of the nonlinear
program (2.1).
Note that even in the free mode we might want to choose steplengths α p , αd that are
shorter than the maximal step sizes αxmax , αzmax . In our implementations, we perform a line
search to enforce progress in a merit function or a filter, both of which are defined with
respect to the barrier problem (2.2) corresponding to the current value µ k . This is possible,
because the adaptive primal-dual search direction is compatible with this barrier problem.
The penalty parameter of the merit function, or the history in the filter, are reset at every
free iteration because the barrier problem itself changes at every free iteration.
In the monotone mode, it is not required to solve each barrier problem to the specified
tolerance before checking whether the method can revert to the free mode. Instead, we
compute the optimality error Φ(x, y, z) for all intermediate iterates in the monotone mode,
and return to the free mode, as soon as Φ(x, y, z) ≤ κMk .
Finally, we point out that criteria other than an optimality measure Φ can be incorporated in the above globalization framework. For example, the decision of when to switch to
the monotone mode could be based on a two-dimensional filter involving the original objective function f (x) and the constraint violation kc(x)k. In this setting, the algorithm would
stay in the free mode as long as the trial iterate reduces either the value of the objective
9
function or the constraint violation sufficiently compared to all previous (free) iterates. Note
that, by using this strategy, the algorithm might never return from the monotone mode.
We implemented this option in ipopt and observed that it was similar in performance to
the optimality measure Φ.
6
Numerical Results
In order to determine an appropriate value of the barrier parameter (3.1) using the quality
function approach, we need to (approximately) minimize qL (σ). Due to the complicated
dependence of the steplengths αxmax (σ), αzmax (σ) on the parameter σ, it does not seem possible to obtain an analytic solution to this one-dimensional minimization problem within a
reasonable amount of computation time. Therefore, we implemented search strategies using
only function evaluations of qL (σ). The search is relatively inexpensive since an evaluation
of qL requires only a few vector-vector operations. An important consideration is that σ be
allowed to take on values greater than one so that the algorithm can recover from overly
aggressive reductions of the barrier parameter.
The search heuristic implemented in ipopt uses a golden bisection procedure (see, e.g.
[16]), ignoring the fact that qL is nonconvex in general. First, golden bisection is performed
in the interval [σ min , 1], where σ min is chosen to correspond to a minimal permissible value
of the barrier parameter µmin (in our implementations, µmin = 10−9 ). If the minimizer
within this search interval appears to be at σ = 1, a second bisection search is performed
in the interval [1, 100]. Each bisection procedure terminates if 10 evaluations of the quality
functions are performed, or if the search interval [a, b] becomes smaller than b × 10 −3 .
The search implemented in knitro proceeds in two phases. First a rough global search
is performed by sampling values of σ distributed in the range [σ min , 1000] (the number of
sample points depends on the value of the average complementarity at the current iterate,
but a typical number of sample points is between 5 and 12). This search returns σ 1 , the
trial point giving the lowest value of qL . If σ1 ≤ 0.99 or σ1 ≥ 100, then we take σ1 as the
approximate minimizer of qL . Otherwise, we perform a more refined search in the range
[0.5, 1] by testing the values σ = 1 − 21l , l = 1..5, and denote the best value from this search
as σ2 . We then take the better of σ1 and σ2 as the approximate minimizer. The idea behind
the more refined search is the following. If the rough search results in a minimizer which
is greater than 1, then before taking this value we want to check more thoroughly whether
we can find a better value of σ < 1 which aims to decrease the current complementarity.
We now discuss the choice of norms in the quality function (4.2) and in the optimality
measure (5.1). (For consistency, we use the same norms and scaling factors in (4.2) and
(5.1).) In ipopt we use the 2-norm, and each of the three terms is divided by the number
of elements in the vectors whose norms are being computed. In knitro, the first two terms
in (4.2) and (5.1) use the infinity-norm, and the complementarity term uses the 1-norm
divided by n. In addition, in an attempt to make the function Φ less scale dependent,
knitro scales the second (primal infeasibility) term by the maximum constraint violation
at the initial point (if this value is greater than one), and scales the first and third terms
(dual infeasibility and complementarity) by max(1, k∇f (xk )k∞ ). These are the scaling
10
Function Evaluations
100
90
90
80
80
70
70
% of problems
% of problems
Iteration Count
100
60
50
40
Monotone (with globalization)
Mehrotra probing (no globalization)
Quality function (no globalization)
Quality function (with globalization)
30
60
50
30
20
20
10
10
0
0
0.5
1
1.5
2
2.5
3
3.5
Monotone (with globalization)
Mehrotra probing (no globalization)
Quality function (no globalization)
Quality function (with globalization)
40
0
4
x
0
0.5
1
1.5
2
2.5
3
3.5
4
x
not more than 2 −times worse than best option
not more than 2 −times worse than best option
(a) ipopt
(b) knitro
Figure 2: Results for the NETLIB test set.
factors used in the knitro termination test [25].
The tests involving ipopt were run on a Dual-Pentium III, 1GHz machine running Linux.
The knitro tests were run on a machine with an AMD Athlon XP 3200+ 2.2GHz processor
running Linux. For both codes, the maximum number of iterations was set to 3000 and the
time limit was set to 1800 CPU seconds. No scaling of the test problems was performed by
the codes since one of our goals is to develop barrier parameter update strategies that are
insensitive to problem scaling. The tests were run using the latest development versions of
ipopt and knitro as of February 2005.
The first results are for the linear programming problems in the NETLIB collection, as
specified in the CUTEr test set [14]. No preprocessing was performed, and no initial point
strategy was employed (i.e., the default starting point x0 = (0, . . . , 0) was used). Figure 2
compares the performance of the quality function approach (both with and without the
globalization framework of Section 5) with two of the strategies described in Section 3,
namely the monotone method and the (unglobalized) Mehrotra probing heuristic. Even
though our focus is on nonlinear optimization, linear programming problems are of interest
since they allow us to assess the effectiveness of the quality function in a context in which
they exactly predict the KKT error. It is apparent from Figure 2 that the quality function
approach is very effective on the NETLIB test set, and that the globalization framework
produces only a slight decrease in efficiency. We note, however, that the quality function
approach requires extra work, and hence may not be the fastest in terms of CPU time.
The performance of the four barrier update strategies on nonlinear programming problems with at least one inequality or bound constraint from the CUTEr collection is reported
in Figure 3. The quality function approach again performs significantly better than the
monotone method, but this time its advantage over Mehrotra probing is less pronounced.
11
Function Evaluations
100
90
90
80
80
70
70
60
% of problems
% of problems
Iteration Count
100
Monotone (with globalization)
Mehrotra probing (no globalization)
Quality function (no globalization)
Quality function (with globalization)
50
40
60
40
30
30
20
20
10
10
0
0
0.5
1
1.5
2
2.5
3
3.5
0
4
x
Monotone (with globalization)
Mehrotra probing (no globalization)
Quality function (no globalization)
Quality function (with globalization)
50
0
0.5
1
1.5
2
2.5
3
3.5
4
x
not more than 2 −times worse than best option
not more than 2 −times worse than best option
(a) ipopt
(b) knitro
Figure 3: Results for CUTEr test set.
These results are of practical importance since the monotone method is currently the default
strategy in both ipopt and knitro; our tests suggest that significant improvements can be
expected by using adaptive strategies. We believe that the quality function approach shows
potential for future advances.
7
Corrector Steps
The numerical results of Section 3 indicate that, when solving nonlinear problems, including
the corrector step in Mehrotra’s method (the MPC method) is not beneficial. This is in stark
contrast with the experience in linear programming and convex quadratic programming,
where the corrector step is known to accelerate the interior-point iteration without degrading
its robustness. In this section we study the effect of the corrector step and find that it
can also be harmful in the linear programming and quadratic programming cases if an
initial point strategy is not used. These observations are relevant because in nonlinear
programming it is much more difficult to find a good starting point.
Let us begin by considering the linear programming case. There are several ways of
viewing the MPC method in this context. One is to consider the step computation as
taking place in three stages (see, e.g., [26]). First, the algorithm computes the affine scaling
T
step (3.2) and uses it to determine the target value of the barrier parameter µ = σ xn z ,
where σ is given by (3.5). Next, the algorithm computes a primal-dual step, say ∆ pd , from
(2.5) using that value of µ. Finally, a corrector step ∆corr is computed by solving (2.5) with
the right hand side given by
(0, ∆X aff ∆Z aff e, 0)T ,
(7.1)
12
where ∆X aff is the diagonal matrix with diagonal entries given by ∆xaff , and similarly for
∆Z aff . The complete MPC step is the sum of the primal-dual and corrector steps. We can
compute it by adding the right hand sides and solving the following system:



 2

∇f (x) − AT (x)y − z
∇xx L −AT (x) −I
∆xmpc
 Z
0
X   ∆y mpc  = −  Xz − µe + ∆X aff ∆Z aff e  .
(7.2)
∆z mpc
c(x)
A(x)
0
0
The new iterate (x+ , y + , z + ) of the MPC method is given by (2.7)-(2.8) with
∆ = (∆xmpc , ∆y mpc , ∆z mpc ).
Alternative views of the MPC method are possible by the linearity of the step computation: we can group the right hand side in (7.2) in different ways and thereby interpret the
step as the sum of different components. Yet all these views point out to an inconsistency
in the MPC approach, as we now discuss.
First of all, let us justify the definition of the corrector step. In the linear programming
case, primal and dual feasibility are linear functions and hence vanish at the full affine
scaling step given by
(x, y, z) + (∆xaff , ∆y aff , ∆z aff ).
(7.3)
The complementarity term takes on the following value at the full affine scaling step:
(X + ∆X aff )(Z + ∆Z aff ) = ∆X aff ∆Z aff .
It follows from this equation that the value of the right hand side vector in (2.5) at the
full affine scaling step (7.3) is given by (7.1). Thus the corrector step can be viewed as a
modified Newton step taken from the point (7.3) and using the primal-dual matrix evaluated
at the current iterate (x, y, z).
The inconsistency in the MPC approach arises because the corrector step, which is
designed to improve the full affine scaling step, is applied at the primal-dual point; see
Figure 4. In some circumstances, this mismatch can cause poor steps. In particular, we
have observed that if the affine scaling step is very long, in the sense that the steplengths
(2.8) are very small, and if the corrector step is even larger, then the addition of the corrector
step to the primal-dual step (2.7) can significantly increase the complementarity value x T z.
This behaviour can be sustained and lead to very slow convergence or failure, as shown
in Table 1. The results in this table were obtained using PCx [5], an interior-point code
for linear programming that implements the MPC method, applied to problem Forplan
from the NETLIB collection. We disabled the initial point strategy of PCx and set the
initial point to x = e, z = e. Note from Table 1 that the affine scaling and corrector steps
appear to grow without bound, and examination of the results shows that the dual variables
diverge.
To provide further support to the claim that the corrector step can be harmful we ran
the complete set of test problems (94 in all) in the NETLIB collection. Using the default
settings, which includes a strategy for computing a good starting point, PCx solved 90
problems, and terminated very close to the solution in the remaining 4 cases. Next we
disabled the initial point strategy and set the initial point to x = e, z = e. PCx was now
13
∆xaff
∆xcorr
∆xpd
PSfrag replacements
x∗ (0)
x∗ (µ)
∆xmpc
Figure 4: An unfavorable corrector step.
Iter
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Primal Obj
9.0515e+01
9.0216e+01
9.0403e+01
9.0769e+01
9.0860e+01
9.1312e+01
9.1710e+01
9.2036e+01
9.2282e+01
9.2279e+01
9.2244e+01
9.2381e+01
9.2462e+01
9.2523e+01
9.2605e+01
Dual Obj
-4.8813e+06
-1.3664e+08
-3.3916e+08
-1.1343e+10
-1.8010e+11
-2.9307e+12
-8.2787e+13
-1.5505e+15
-6.8149e+16
-4.4155e+18
-2.8697e+20
-3.1118e+22
-7.0471e+24
-9.9820e+26
-1.1959e+30
PriInf
1.0e-00
1.0e-00
1.0e-00
1.0e-00
1.0e-00
1.0e-00
1.0e-00
1.0e-00
1.0e-00
1.0e-00
1.0e-00
1.0e-00
1.0e-00
1.0e-00
1.0e-00
DualInf
1.9e+00
1.9e+00
1.9e+00
1.9e+00
1.9e+00
1.9e+00
1.9e+00
1.9e+00
1.9e+00
1.9e+00
1.9e+00
1.9e+00
2.2e+01
2.8e+03
2.2e+01
T
αxmax
αzmax
log( xn z )
k∆aff k
k∆mpc k
0.0e+00
8.6e-13
7.3e-13
4.0e-12
1.5e-12
4.3e-12
6.0e-12
7.5e-12
7.0e-12
9.2e-12
6.8e-12
1.1e-11
6.2e-12
1.4e-11
2.1e-11
0.0e+00
5.0e-12
4.8e-13
1.2e-11
5.0e-12
5.1e-12
9.1e-12
6.0e-12
1.4e-11
2.1e-11
2.1e-11
3.6e-11
7.6e-11
4.7e-11
4.0e-10
0.00
0.08
0.16
1.18
2.35
3.56
5.01
6.28
7.93
9.74
11.55
13.58
15.94
18.09
21.17
0.0e+00
5.6e+06
9.1e+07
2.2e+08
8.0e+09
1.3e+11
2.1e+12
5.9e+13
1.1e+15
4.8e+16
3.1e+18
2.0e+20
2.2e+22
5.0e+24
7.1e+26
0.0e+00
1.2e+13
1.9e+14
3.9e+14
1.4e+16
2.2e+17
3.6e+18
1.0e+20
1.9e+21
8.3e+22
5.4e+24
3.5e+26
3.8e+28
8.6e+30
1.2e+33
Table 1: Output for NETLIB problem Forplan for default PCx with bad starting point
14
Iteration Count
100
90
80
default MPC corrector
no corrector
conditional MPC corrector
% of problems
70
60
50
40
30
20
10
0
0
0.5
1
1.5
2
2.5
3
3.5
4
x
not more than 2 −times worse than best solver
Figure 5: Results on the NETLIB test set for three corrector step strategies implemented
in PCx. The initial point was set to x = e, z = e.
able to solve only 28 problems (and in only 3 additional cases terminated very close to the
solution).
We repeated the experiment, using the initial point x = e, z = e, but this time removing
the corrector step; this corresponds to the algorithm called Mehrotra probing in Section 3.
We also tested a variant that we call conditional MPC in which the corrector step is employed
in the MPC method only if it does not result in an increase of complementarity by a factor
larger than 2. The results, in terms of iterations, are reported in Figure 5. Note the
dramatic increase in robustness of both strategies, compared with the MPC algorithm.
The conditional MPC strategy is motivated by the observation that harmful effects of the
corrector steps manifest themselves in a significant increase in complementarity.
Finally we compare the monotone and quality function approaches described in Section 3
with the conditional MPC approach on the nonlinear programming problems used in that
section. The conditional MPC method is now implemented so as to reject corrector steps
that increase complementarity (this more conservative approach appears to be more suitable
in the nonlinear case). Furthermore, if the conditional MPC step does not pass the merit
function or filter acceptance test for the current barrier problem, the corrector step is also
rejected, and the backtracking line search for the regular primal-dual step is executed.
The results, given in Figure 6, indicate that this conditional MPC method requires fewer
iterations and function evaluations, and is not less robust, than the other strategies.
15
Function Evaluations
100
90
90
80
80
70
70
% of problems
% of problems
Iteration Count
100
60
50
Monotone (with globalization)
Quality function (with globalization)
Quality function with corrector (with globalization)
40
50
40
30
30
20
20
10
10
0
0
0.5
1
1.5
2
2.5
3
3.5
0
4
Monotone (with globalization)
Quality function (with globalization)
Quality function with corrector (with globalization)
60
0
0.5
1
1.5
2
2.5
3
3.5
4
x
not more than 2x−times worse than best solver
not more than 2 −times worse than best solver
(a) ipopt
(b) knitro
Figure 6: Results for safeguarded corrector steps.
8
Final Remarks
A multi-objective quality function was used by Meszaros [18] to estimate the optimal
steplengths for interior methods for quadratic programming. His solution technique is
based on exploiting the properties of the efficient frontier of the multi-objective optimization problem. The numerical results show a consistent (but modest) improvement over
using equal steplengths. Curtis et al. [4] study a quality function that is much closer to
the one discussed in Section 4. They formulate the problem of finding optimal steplengths
for quadratic programming using the quality function (4.1) specialized to the quadratic
programming case. In this paper we have gone beyond the approaches in [4, 18] in that
the quality function is used to determine the barrier parameter, and also indirectly, the
steplengths.
Acknowledgment. We would like to thank Richard Byrd for his many valuable comments and suggestions during the course of this work.
References
[1] J. Betts, S. K. Eldersveld, P. D. Frank, and J. G. Lewis. An interior-point nonlinear
programming algorithm for large scale optimization. Technical report MCT TECH003, Mathematics and Computing Technology, The Boeing Company, P.O. Box 3707,
Seattle WA 98124-2207, 2000.
16
[2] R. H. Byrd, J.-Ch. Gilbert, and J. Nocedal. A trust region method based on interior
point techniques for nonlinear programming. Mathematical Programming, 89(1):149–
185, 2000.
[3] R. H. Byrd, M. E. Hribar, and J. Nocedal. An interior point algorithm for large scale
nonlinear programming. SIAM Journal on Optimization, 9(4):877–900, 1999.
[4] Curtis, F, J. Nocedal, and R. A. Waltz. Steplength selection in interior methods.
Technical report, Northwestern University, Evanston, IL, USA, 2005.
[5] Czyzyk, J., S. Mehrotra, and S. J. Wright. PCx User Guide. Technical report, Argonne
National Laboratory, Argonne, IL, USA, 1996.
[6] E. D. Dolan and J. J. Moré. Benchmarking optimization software with performance
profiles. Mathematical Programming, Series A, 91:201–213, 2002.
[7] A. S. El-Bakry, R. A. Tapia, T. Tsuchiya, and Y. Zhang. On the formulation and
theory of the Newton interior-point method for nonlinear programming. Journal of
Optimization Theory and Applications, 89(3):507–541, June 1996.
[8] A. V. Fiacco and G. P. McCormick. Nonlinear Programming: Sequential Unconstrained
Minimization Techniques. J. Wiley and Sons, Chichester, England, 1968. Reprinted as
Classics in Applied Mathematics 4, SIAM, Philadelphia, USA, 1990.
[9] A. Forsgren and P. E. Gill. Primal-dual interior methods for nonconvex nonlinear
programming. SIAM Journal on Optimization, 8(4):1132–1152, 1998.
[10] D. M. Gay, M. L. Overton, and M. H. Wright. A primal-dual interior method for
nonconvex nonlinear programming. In Y. Yuan, editor, Advances in Nonlinear Programming (Beijing, 1996), pages 31–56, Dordrecht, The Netherlands, 1998. Kluwer
Academic Publishers.
[11] E. M. Gertz and P. E. Gill. A primal-dual trust region algortihm for nonlinear programming. Numerical Analysis Report NA 02-1, University of California, San Diego,
2002.
[12] D. Goldfarb and L. Chen. Interior-point `2 penalty methods for nonlinear programming
with strong global convergence properties. Technical report, IEOR Dept, Columbia
University, New York, NY 10027, 2004.
[13] N. I. M. Gould, D. Orban, and Ph. Toint. An interior-point l1-penalty method for
nonlinear optimization. Technical Report RAL-TR-2003-022, Rutherford Appleton
Laboratory Chilton, Oxfordshire, UK, 2003.
[14] N. I. M. Gould, D. Orban, and Ph. L. Toint. CUTEr and sifdec: A Constrained and
Unconstrained Testing Environment, revisited. ACM Trans. Math. Softw., 29(4):373–
394, 2003.
17
[15] I. Griva, D.F. Shanno, and R.J. Vanderbei. Convergence Analysis of a Primal-Dual
Method for Nonlinear Programming. Report, Department of Operations Research and
Financial Engineering, Princeton University, Princeton, NJ, USA, 2004.
[16] D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley Publishing
Company, Reading, Massachusetts, USA, second edition, 1984.
[17] S. Mehrotra. On the implementation of a primal-dual interior point method. SIAM
Journal on Optimization, 2(4):575–601, 1992.
[18] C. Mészáros. Steplengths in infeasible primal-dual interior point algorithms of convex
quadratic programming. Technical Report DOC 97/7, Imperial College, London, UK.,
1997.
[19] A. L. Tits, A. Wächter, S. Bakhtiari, T. J. Urban, and C. T. Lawrence. A primaldual interior-point method for nonlinear programming with strong global and local
convergence properties. SIAM Journal on Optimization, 14(1):173–199, 2003.
[20] M. Ulbrich, S. Ulbrich, and L. Vicente. A globally convergent primal-dual interior
point filter method for nonconvex nonlinear programming. Technical Report TR00-11,
Department of Mathematics, University of Coimbra, Coimbra, Portugal, 2000.
[21] R. J. Vanderbei and D. F. Shanno. An interior point algorithm for nonconvex nonlinear
programming. Computational Optimization and Applications, 13:231–252, 1999.
[22] A. Wächter and L. T. Biegler. Line search filter methods for nonlinear programming:
Motivation and global convergence. Technical Report RC 23036, IBM T. J. Watson
Research Center, Yorktown Heights, NY, USA, 2003.
[23] A. Wächter and L. T. Biegler. On the implementation of a primal-dual interior point
filter line search algorithm for large-scale nonlinear programming. Technical Report
RC 23149, IBM T. J. Watson Research Center, Yorktown Heights, NY, USA, March
2004.
[24] R. A. Waltz. Knitro 4.0 User’s Manual. Technical report, Ziena Optimization, Inc.,
Evanston, IL, USA, October 2004.
[25] R. A. Waltz, J. L. Morales, J. Nocedal, and D. Orban. An interior algorithm for
nonlinear optimization that combines line search and trust region steps. Technical
Report 2003-6, Optimization Technology Center, Northwestern University, Evanston,
IL, USA, June 2003. To appear in Mathematical Programming A.
[26] S. Wright. Primal-dual interior-point methods. Society for Industrial and Applied
Mathematics, Philadelphia, 1997.
[27] H. Yamashita. A globally convergent primal-dual interior-point method for constrained
optimization. Technical report, Mathematical System Institute, Inc., Tokyo, Japan,
May 1992. Revised March 1994.
18
[28] H. Yamashita, H. Yabe, and T. Tanabe. A globally and superlinearly convergent
primal-dual interor point trust region method for large scale constrained optimization.
Technical report, Mathematical System Institute, Inc., 2004. To appear in Mathematical Programming A.
19
Download