Research Article An Approximate Quasi-Newton Bundle-Type Method for Nonsmooth Optimization Jie Shen,

advertisement
Hindawi Publishing Corporation
Abstract and Applied Analysis
Volume 2013, Article ID 697474, 7 pages
http://dx.doi.org/10.1155/2013/697474
Research Article
An Approximate Quasi-Newton Bundle-Type Method for
Nonsmooth Optimization
Jie Shen,1 Li-Ping Pang,2 and Dan Li2
1
2
School of Mathematics, Liaoning Normal University, Dalian 116029, China
School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China
Correspondence should be addressed to Jie Shen; tt010725@163.com
Received 22 January 2013; Revised 31 March 2013; Accepted 1 April 2013
Academic Editor: Gue Lee
Copyright © 2013 Jie Shen et al. This is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
An implementable algorithm for solving a nonsmooth convex optimization problem is proposed by combining Moreau-Yosida
regularization and bundle and quasi-Newton ideas. In contrast with quasi-Newton bundle methods of Mifflin et al. (1998), we only
assume that the values of the objective function and its subgradients are evaluated approximately, which makes the method easier
to implement. Under some reasonable assumptions, the proposed method is shown to have a Q-superlinear rate of convergence.
1. Introduction
In this paper we are concerned with the unconstrained minimization of a real-valued, convex function 𝑓 : 𝑅𝑛 → 𝑅,
namely,
min
𝑓 (π‘₯)
s.t. π‘₯ ∈ 𝑅𝑛 ,
(1)
and in general 𝑓 is nondifferentiable. A number of attempts
have been made to obtain convergent algorithms for solving
(1). Fukushima and Qi [1] propose an algorithm for solving
(1) under semismoothness and regularity assumptions. The
proposed algorithm is shown to have a Q-superlinear rate of
convergence. An implementable BFGS method for general
nonsmooth problems is presented by Rauf and Fukushima
[2], and global convergence is obtained based on the assumption of strong convexity. A superlinearly convergent method
for (1) is proposed by Qi and Chen [3], but it requires
the semismoothness condition. He [4] obtains a globally
convergent algorithm for convex constrained minimization
problems under certain regularity and uniform continuity
assumptions. Among methods for nonsmooth optimization
problems, some have superlinear rate of convergence, for
instance, see Mifflin and Sagastizábal [5] and Lemaréchal
et al. [6]. They propose two conceptual algorithms with
superlinear convergence for minimizing a class of convex
functions, and the latter demands that the objective function
𝑓 should be differentiable in a certain space π‘ˆ (the subspace
along which πœ•π‘“(𝑝) has 0 breadth at a given point 𝑝), but
sometimes it is difficult to decompose the space. Besides these
methods mentioned above, there is a quasi-Newton bundle
type method proposed by Mifflin et al. [7] it has superlinear
rate of convergence, but the exact values of the objective
function and its subgradients are required. In this paper, we
present an implementable algorithm by using bundle and
quasi-Newton ideas and Moreau-Yosida regularization, and
the proposed algorithm can be shown to have a superlinear
rate of convergence. An obvious advantage of the proposed
algorithm lies in the fact that we only need the approximate
values of the objective function and its subgradients.
It is well known that (1) can be solved by means of the
Moreau-Yosida regularization 𝐹 : 𝑅𝑛 → 𝑅 of 𝑓, which is
defined by
𝐹 (π‘₯) = min𝑛 {𝑓 (𝑧) + (2πœ†)−1 ‖𝑧 − π‘₯β€–2 } ,
𝑧∈𝑅
(2)
where πœ† is a fixed positive parameter and β€– ⋅ β€– denotes the
Euclidean norm or its induced matrix norm on 𝑅𝑛×𝑛 . The
problem of minimizing 𝐹(π‘₯), that is,
min
𝐹 (π‘₯)
s.t. π‘₯ ∈ 𝑅𝑛 ,
(3)
2
Abstract and Applied Analysis
is equivalent to (1) in the sense that π‘₯ ∈ 𝑅𝑛 solves (1) if and
only if it solves (3), see Hiriart-Urruty and Lemaréchal [8].
The problem (3) has a remarkable feature that the objective
function 𝐹 is a differentiable convex function, even though 𝑓
is nondifferentiable. Moreover 𝐹 has a Lipschitz continuous
gradient
𝐺 (π‘₯) = πœ†−1 (π‘₯ − 𝑝 (π‘₯)) ∈ πœ•π‘“ (𝑝 (π‘₯)) ,
πœ•π΅ 𝐺 (π‘₯) = {𝐷 ∈ 𝑅𝑛×𝑛 | 𝐷 = lim ∇𝐺 (π‘₯π‘˜ ) ,
π‘₯π‘˜ → π‘₯
(5)
is nonempty and bounded for each π‘₯. We say 𝐺 is BDregular at π‘₯ if all matrices 𝐷 ∈ πœ•π΅ 𝐺(π‘₯) are nonsingular. It
is reasonable to pay more attention to the problem (3) since
𝐹 has such good properties. However, because the MoreauYosida regularization itself is defined through a minimization
problem involving 𝑓, the exact values of 𝐹 and its gradient
𝐺 at an arbitrary point π‘₯ are difficult or even impossible
to compute in general. Therefore, we attempt to explore the
possibility of utilizing the approximations of these values.
Several attempts have been made to combine quasiNewton idea with Moreau-Yosida regularization to solve (1).
For related works on this subject, see Chen and Fukushima
[9] and Mifflin [10]. In particular, Mifflin et al. [7] consider
using bundle ideas to approximate linearly the values of 𝑓 in
order to approximate 𝐹 in which the exact values of 𝑓 and one
of its subgradients 𝑔 at some points are needed. In this paper
we assume that for given π‘₯ ∈ 𝑅𝑛 and πœ€ ≥ 0, we can find some
𝑓̃ ∈ 𝑅 and π‘”π‘Ž (π‘₯, πœ€) ∈ 𝑅𝑛 such that
𝑓 (π‘₯) ≥ 𝑓̃ ≥ 𝑓 (π‘₯) − πœ€,
∀𝑧 ∈ 𝑅𝑛 ,
(6)
which means that π‘”π‘Ž (π‘₯, πœ€) ∈ πœ•πœ€ 𝑓(π‘₯). This setting is realistic in
many applications, see Kiwiel [11]. Let us see some examples.
Assume that 𝑓 is strongly convex with modulus πœ‡ > 0, that
is,
πœ‡
‖𝑧 − π‘₯β€–2 ≤ 𝑓 (𝑧) ,
2
∀𝑧, π‘₯ ∈ 𝑅 ,
σ΅„©
σ΅„© σ΅„©σ΅„©
+ σ΅„©σ΅„©σ΅„©πœ‰σ΅„©σ΅„©σ΅„© σ΅„©σ΅„©σ΅„©∇β„Ž V (π‘₯) − ∇V (π‘₯)σ΅„©σ΅„©σ΅„© ‖𝑧 − π‘₯β€–
≤ 𝑓 (𝑧) −
(7)
𝑔 (π‘₯) ∈ πœ•π‘“ (π‘₯) ,
and that 𝑓(π‘₯) = 𝑀(V(π‘₯)) with V : 𝑅𝑛 → π‘…π‘š continuously
differentiable and 𝑀 : π‘…π‘š → 𝑅 convex. By the chain rule
𝑇
we have πœ•π‘“(π‘₯) = {∑π‘š
∈
𝑖=1 πœ‰π‘– ∇V𝑖 (π‘₯) | πœ‰ = (πœ‰1 , πœ‰2 , . . . , πœ‰π‘› )
πœ•π‘€(V(π‘₯))}. Now assume that we have an approximation
∇β„Ž V(π‘₯) of ∇V(π‘₯) such that β€–∇β„Ž V(π‘₯)−∇V(π‘₯)β€–≤ πœ…(β„Ž), β„Ž > 0. Such
an approximation may be obtained by using finite differences.
(8)
πœ‡
σ΅„© σ΅„©
‖𝑧 − π‘₯β€–2 + πœ… (β„Ž) σ΅„©σ΅„©σ΅„©πœ‰σ΅„©σ΅„©σ΅„© ‖𝑧 − π‘₯β€–
2
for all π‘₯, 𝑧 ∈ 𝑅𝑛 and 𝑔(π‘₯) = ∑π‘š
𝑖=1 πœ‰π‘– ∇V𝑖 (π‘₯) ∈ πœ•π‘“(π‘₯). Some
simple manipulations show that
πœ‡
σ΅„© σ΅„©
‖𝑧 − π‘₯β€–2 + πœ… (β„Ž) σ΅„©σ΅„©σ΅„©πœ‰σ΅„©σ΅„©σ΅„© ‖𝑧 − π‘₯β€–
2
≤
where 𝐺 is differentiable at π‘₯ }
𝑛
≤ 𝑓 (π‘₯) + 𝑔(π‘₯)𝑇 (𝑧 − π‘₯)
−
π‘˜
𝑓 (π‘₯) + 𝑔(π‘₯)𝑇 (𝑧 − π‘₯) +
𝑓 (π‘₯) + π‘”β„Ž (π‘₯)𝑇 (𝑧 − π‘₯)
(4)
where 𝑝(π‘₯) is the unique minimizer of (2) and πœ•π‘“ is
the subdifferential mapping of 𝑓. Hence, by Rademacher’s
theorem, 𝐺 is differentiable almost everywhere and the set
𝑓 (𝑧) ≥ 𝑓̃ + βŸ¨π‘”π‘Ž (π‘₯, πœ€) , 𝑧 − π‘₯⟩ ,
In this case, typically πœ…(β„Ž) → 0 for β„Ž → 0. Let π‘”β„Ž (π‘₯) =
∑π‘š
𝑖=1 πœ‰π‘– ∇β„Ž V𝑖 (π‘₯), πœ‰ ∈ πœ•π‘€(V(π‘₯)). Then, we have
1 σ΅„©σ΅„© σ΅„©σ΅„©2
2
σ΅„©πœ‰σ΅„© πœ…(β„Ž) =: πœ€β„Ž ,
2πœ‡ σ΅„© σ΅„©
∀π‘₯, 𝑧 ∈ 𝑅𝑛 .
(9)
By the definition of πœ‰, the bound πœ€β„Ž depends on π‘₯, we obtain
𝑓 (π‘₯) + π‘”β„Ž (π‘₯)𝑇 (𝑧 − π‘₯) ≤ 𝑓 (𝑧) + πœ€β„Ž ,
∀𝑧 ∈ 𝑅𝑛 .
(10)
From the local boundedness of πœ•π‘€(V(π‘₯)), we infer that πœ€β„Ž > 0
is locally bounded. Thus, π‘”β„Ž (π‘₯) is an πœ€β„Ž -subgradient of 𝑓 at π‘₯,
see Hintermüller [12]. As for the approximate function values,
if 𝑓 is a max-type function of the form
𝑓 (π‘₯) = sup {πœ™π‘’ (π‘₯) | 𝑒 ∈ π‘ˆ} ,
∀π‘₯ ∈ 𝑅𝑛 ,
(11)
where each πœ™π‘’ : 𝑅𝑛 → 𝑅 is convex and π‘ˆ is an infinite
set, then it may be impossible to calculate 𝑓(π‘₯). However,
for any positive πœ€ one can usually find in finite time an πœ€solution to the maximization problem (11), that is, an element
π‘’πœ€ ∈ π‘ˆ satisfying πœ™π‘’πœ€ ≥ 𝑓(π‘₯) − πœ€. Then one may set
π‘“πœ€ (π‘₯) = πœ™π‘’πœ€ (π‘₯). On the other hand, in some applications,
calculating π‘’πœ€ for a prescribed πœ€ ≥ 0 may require much
less work than computing 𝑒0 . This is, for instance, the
case when the maximization problem (11) involves solving
a linear or discrete programming problem by the methods
of Gabasov and Kirilova [13]. Some people have tried to
solve (1) by assuming the values of the objective function,
and its subgradients can only be computed approximately.
For example, Solodov [14] considers the proximal form of a
bundle algorithm for (1), assuming the values of the function
and its subgradients are evaluated approximately, and it is
shown how these approximations should be controlled in
order to satisfy the desired optimality tolerance. Kiwiel [15]
proposes an algorithm for (1), and the algorithm utilizes
the approximation evaluations of the objective function
and its subgradients; global convergence of the method is
obtained. Kiwiel [11] introduces another method for (1); it
requires only the approximate evaluations of 𝑓 and its πœ€subgradients, and this method converges globally. It is in
evidence that bundle methods with superlinear convergence
for solving (1) by using approximate values of the objective
and its subgradients are seldom obtained. Compared with
the methods mentioned above, the method proposed in this
paper is not only implementable but also has a superlinear
Abstract and Applied Analysis
3
rate of convergence under some additional assumptions, and
it should be noted that we only use the approximate values of
the objective function and its subgradients which makes the
algorithm easier to implement.
Some notations are listed below for presenting the algorithm.
π‘˜
where 𝑓̃π‘₯ ∈ 𝑅 satisfies
π‘˜
𝑓 (π‘₯π‘˜ ) ≥ 𝑓̃π‘₯ ≥ 𝑓 (π‘₯π‘˜ ) − πœ€π‘₯π‘˜ ,
(iii) 𝑝(π‘₯) = arg min𝑧∈𝑅𝑛 {𝑓(𝑧)+(2πœ†)−1 β€– 𝑧−π‘₯β€–2 }, the unique
minimizer of (2).
(iv) 𝐺(π‘₯) = πœ†−1 (π‘₯ − 𝑝(π‘₯)), the gradient of 𝐹 at π‘₯.
This paper is organized as follows: in Section 2, to
approximate the unique minimizer 𝑝(π‘₯) of (2), we introduce
the bundle idea, which uses approximate values of the objective function and its subgradients. The approximate quasiNewton bundle-type algorithm is presented in Section 3. In
the last section, we prove the global convergence and, under
additional assumptions, Q-superlinear convergence of the
proposed algorithm.
2. The Approximation of 𝑝(π‘₯)
𝐹 (π‘₯π‘˜ ) = min𝑛 {𝑓 (π‘₯π‘˜ + 𝑠) + (2πœ†)−1 ‖𝑠‖2 } .
𝑠∈𝑅
(12)
Now we consider approximating 𝑓(π‘₯π‘˜ +𝑠) by using the bundle
idea. Suppose we have a bundle π½π‘˜ generated sequentially
starting from π‘₯π‘˜ and possibly a subset of the previous set used
to generate π‘₯π‘˜ . The bundle includes the data (𝑧𝑖 , 𝑓̃𝑖 , π‘”π‘Ž (𝑧𝑖 , πœ€π‘– )),
𝑖 ∈ π½π‘˜ , where 𝑧𝑖 ∈ 𝑅𝑛 , 𝑓̃𝑖 ∈ 𝑅, and π‘”π‘Ž (𝑧𝑖 , πœ€π‘– ) ∈ 𝑅𝑛 satisfy
𝑓 (𝑧𝑖 ) ≥ 𝑓̃𝑖 ≥ 𝑓 (𝑧𝑖 ) − πœ€π‘– ,
𝑓 (𝑧) ≥ 𝑓̃𝑖 + βŸ¨π‘”π‘Ž (𝑧𝑖 , πœ€π‘– ) , 𝑧 − 𝑧𝑖 ⟩ ,
∀𝑧 ∈ 𝑅𝑛 .
(13)
Suppose that the elements in π½π‘˜ can be arranged according
to the order of their entering the bundle. Without loss of
generality we may suppose π½π‘˜ = {1, . . . , 𝑗}. πœ€π‘– is updated by
the rule πœ€π‘–+1 = π›Ύπœ€π‘– , 0 < 𝛾 < 1, 𝑖 ∈ π½π‘˜ . The condition (13)
means π‘”π‘Ž (𝑧𝑖 , πœ€π‘– ) ∈ πœ•πœ€π‘– 𝑓(𝑧𝑖 ), 𝑖 ∈ π½π‘˜ . By using the data in the
bundle we construct a polyhedral function π‘“π‘Ž (π‘₯π‘˜ + 𝑠) defined
by
𝑇
π‘“π‘Ž (π‘₯π‘˜ + 𝑠) = max {𝑓̃𝑖 + π‘”π‘Ž (𝑧𝑖 , πœ€π‘– ) (π‘₯π‘˜ + 𝑠 − 𝑧𝑖 )} .
𝑖∈π½π‘˜
π‘˜
𝑖∈π½π‘˜
(17)
Let
πΉπ‘Ž (π‘₯π‘˜ ) = min𝑛 {π‘“π‘Ž (π‘₯π‘˜ + 𝑠) + (2πœ†)−1 ‖𝑠‖2 }
𝑠∈𝑅
𝑇
= 𝑓̃π‘₯ + min𝑛 {max {π‘”π‘Ž (𝑧𝑖 , πœ€π‘– ) 𝑠 − 𝛼 (π‘₯π‘˜ , 𝑧𝑖 , πœ€π‘– )}
π‘˜
𝑠∈𝑅
𝑖∈π½π‘˜
+ (2πœ†)−1 𝑠𝑇 𝑠} .
(18)
The problem (18) can be dealt with by solving the following
quadratic programming:
min
V + πœ†(2)−1 𝑠𝑇 𝑠,
𝑇
π‘”π‘Ž (𝑧𝑖 , πœ€π‘– ) 𝑠 − 𝛼 (π‘₯π‘˜ , 𝑧𝑖 , πœ€π‘– ) ≤ V
π‘˜
(15)
∀𝑖 ∈ π½π‘˜ .
(19)
As iterations go along, the number of elements in bundle
π½π‘˜ increases. When the size of the bundle becomes too
big, it may cause serious computational difficulties in the
form of unbounded storage requirement. To overcome these
difficulties, it is necessary to compress the bundle and clean
the model. Wolfe [16] and Lemaréchal [17], for the first time,
introduce the aggregation strategy, which requires storing
only a limited number of subgradients, see Kiwiel and Mifflin
[18–20]. Aggregation strategy is the synthesis mechanism that
condenses the essential information of the bundle into one
Μƒ
single couple (π‘”Μ‚πœ€π‘˜ , π›ΌΜ‚Μƒπ‘˜ ) (defined below). The corresponding
affine function, inserted in the model when there is compression, is called aggregate linearization (defined below).
This function summarizes all the information generated up to
iteration π‘˜. Suppose 𝐽max is the upper bound of the number of
elements in π½π‘˜ , π‘˜ = 1, 2, . . . . If |π½π‘˜ | reaches the prescribed 𝐽max ,
two or more of those elements are deleted from the bundle π½π‘˜ ;
that is, two or more linear pieces in the constraints of (19)
are discarded (notice that different selections of discarded
linear pieces may result in different speed of convergence),
and introduce the aggregate linearization associated with the
aggregate πœ€-subgradient and linearization error into bundle.
Define the aggregate linearization as
Μƒ
𝑓𝑑 (π‘₯π‘˜ + 𝑠) = 𝑓̃π‘₯ + βŸ¨π‘”Μ‚πœ€π‘˜ , π‘ βŸ© − π›ΌΜ‚Μƒπ‘˜ ,
(14)
Obviously π‘“π‘Ž (π‘₯π‘˜ + 𝑠) is a lower approximation of 𝑓(π‘₯π‘˜ + 𝑠), so
π‘“π‘Ž (π‘₯π‘˜ + 𝑠) ≤ 𝑓(π‘₯π‘˜ + 𝑠). We define a linearization error by
𝑇
𝛼 (π‘₯π‘˜ , 𝑧𝑖 , πœ€π‘– ) = 𝑓̃π‘₯ − 𝑓̃𝑖 − π‘”π‘Ž (𝑧𝑖 , πœ€π‘– ) (π‘₯π‘˜ − 𝑧𝑖 ) ,
𝑇
π‘˜
π‘“π‘Ž (π‘₯π‘˜ + 𝑠) = 𝑓̃π‘₯ + max {π‘”π‘Ž (𝑧𝑖 , πœ€π‘– ) 𝑠 − 𝛼 (π‘₯π‘˜ , 𝑧𝑖 , πœ€π‘– )} .
s.t.
Let π‘₯ = π‘₯π‘˜ and 𝑠 = 𝑧 − π‘₯π‘˜ , where π‘₯π‘˜ is the current iterate
point of AQNBT algorithm presented in Section 3, then (13)
has the form
(16)
Then π‘“π‘Ž (π‘₯π‘˜ + 𝑠) can be written as
(i) πœ•π‘“(π‘₯) = {πœ‰ ∈ 𝑅𝑛 | 𝑓(𝑧) ≥ 𝑓(π‘₯) + πœ‰π‘‡ (𝑧 − π‘₯), ∀𝑧 ∈ 𝑅𝑛 },
the subdifferential of 𝑓 at π‘₯, and each such πœ‰ is called
a subgradient of 𝑓 at π‘₯.
(ii) πœ•πœ€ 𝑓(π‘₯) = {πœ‚ ∈ 𝑅𝑛 | 𝑓(𝑧) ≥ 𝑓(π‘₯) + πœ‚π‘‡ (𝑧 − π‘₯) − πœ€}, the
πœ€-subdifferential of 𝑓 at π‘₯, and each such πœ‚ is called an
πœ€-subgradient of 𝑓 at π‘₯.
for given πœ€π‘₯π‘˜ ≥ 0.
Μƒ
(20)
where π‘”Μ‚πœ€π‘˜ = ∑𝑖∈π½π‘˜ πœ‡π‘– π‘”π‘Ž (𝑧𝑖 , πœ€π‘– ), π›ΌΜ‚Μƒπ‘˜ = ∑𝑖∈π½π‘˜ πœ‡π‘– 𝛼(π‘₯π‘˜ , 𝑧𝑖 , πœ€π‘– ).
Multiplier πœ‡ = (πœ‡π‘– )𝑖∈π½π‘˜ is the optimal solution of dual problem
for (19), see Solodov [14]. By doing so, the surrogate aggregate
linearization maintains the information of the deleted linear
4
Abstract and Applied Analysis
pieces and at the same time the problem (19) is manageable
since the number of the elements in π½π‘˜ is limited. Suppose
𝑠(π‘₯π‘˜ ) solves the problem (19), and let π‘π‘Ž (π‘₯π‘˜ ) = π‘₯π‘˜ + 𝑠(π‘₯π‘˜ ) be
an approximation of 𝑝(π‘₯π‘˜ ) and πœ€π‘π‘Ž (π‘₯π‘˜ ) = πœ€π‘—+1 = π›Ύπœ€π‘— . Let
πΉπ‘Ž (π‘₯π‘˜ ) = 𝑓̃𝑝
where 𝑓̃𝑝
π‘Ž
(π‘₯π‘˜ )
π‘Ž
(π‘₯π‘˜ )
𝑇
+ πœ€π‘π‘Ž (π‘₯π‘˜ ) + (2πœ†)−1 𝑠(π‘₯π‘˜ ) 𝑠 (π‘₯π‘˜ ) ,
(21)
∈ 𝑅 is chosen to satisfy
𝑓 (π‘π‘Ž (π‘₯π‘˜ )) ≥ 𝑓̃𝑝
π‘Ž
(π‘₯π‘˜ )
≥ 𝑓 (π‘π‘Ž (π‘₯π‘˜ )) − πœ€π‘π‘Ž (π‘₯π‘˜ ) .
(22)
The results stated below are fundamental and useful in the
subsequent discussions.
(P1) πΉπ‘Ž (π‘₯π‘˜ ) ≤ 𝐹(π‘₯π‘˜ ) ≤ πΉπ‘Ž (π‘₯π‘˜ ).
(P2) πΉπ‘Ž (π‘₯π‘˜ ) = 𝐹(π‘₯π‘˜ ) if and only if π‘π‘Ž (π‘₯π‘˜ ) = 𝑝(π‘₯π‘˜ ) and
π‘Ž π‘˜
𝑓̃𝑝 (π‘₯ ) = 𝑓(𝑝(π‘₯π‘˜ )).
Note that 𝑝(π‘₯π‘˜ ) is the unique minimizer of (2) and (P1) and
(P2) can be obtained by the definitions of πΉπ‘Ž (π‘₯π‘˜ ), πΉπ‘Ž (π‘₯π‘˜ ), and
𝐹(π‘₯π‘˜ ).
(P3)
(i) If we define πΉπ‘’π‘Ž (π‘₯π‘˜ ) = min𝑠∈𝑅𝑛 {max𝑖∈π½π‘˜ {𝑓(𝑧𝑖 ) +
𝑔(𝑧𝑖 )𝑇 (π‘₯π‘˜ + 𝑠 − 𝑧𝑖 )} + (2πœ†)−1 𝑠𝑇 𝑠}, where 𝑔(𝑧𝑖 ) ∈
πœ•π‘“(𝑧𝑖 ), then πΉπ‘Ž (π‘₯π‘˜ ) → πΉπ‘’π‘Ž (π‘₯π‘˜ ) as the new point
𝑧𝑗+1 = π‘₯π‘˜ + 𝑠(π‘₯π‘˜ ) is appended into the bundle π½π‘˜
infinitely.
(ii) Let πœ€ = max𝑖∈π½π‘˜ {πœ€π‘– }. if π‘”π‘Ž (𝑧𝑖 , πœ€π‘– ) = 𝑔(𝑧𝑖 ) ∈ πœ•π‘“(𝑧𝑖 ),
then πΉπ‘Ž (π‘₯π‘˜ ) ≥ πΉπ‘’π‘Ž (π‘₯π‘˜ ) − πœ€.
Because πœ€π‘– → 0 by the update rule πœ€π‘–+1 = π›Ύπœ€π‘– , 0 < 𝛾 < 1, we
have π‘”π‘Ž (𝑧𝑖 , πœ€π‘– ) → 𝑔(𝑧𝑖 ) and 𝑓̃𝑖 → 𝑓(𝑧𝑖 ). Thus π‘“π‘Ž (π‘₯π‘˜ + 𝑠) →
max𝑖∈π½π‘˜ {𝑓(𝑧𝑖 ) + 𝑔(𝑧𝑖 )𝑇 (π‘₯π‘˜ + 𝑠 − 𝑧𝑖 )}, so πΉπ‘’π‘Ž (π‘₯π‘˜ ) → πΉπ‘Ž (π‘₯π‘˜ ). It
is easy to see that π‘“π‘Ž (π‘₯π‘˜ + 𝑠) = max𝑖∈π½π‘˜ {𝑓̃𝑖 + π‘”π‘Ž (𝑧𝑖 , πœ€π‘– )𝑇 (π‘₯π‘˜ +
𝑠 − 𝑧𝑖 )} ≥ max𝑖∈π½π‘˜ {𝑓(𝑧𝑖 ) + π‘”π‘Ž (𝑧𝑖 , πœ€π‘– )𝑇 (π‘₯π‘˜ + 𝑠 − 𝑧𝑖 ) − πœ€π‘– } ≥
max𝑖∈π½π‘˜ {𝑓(𝑧𝑖 ) + 𝑔(𝑧𝑖 )𝑇 (π‘₯π‘˜ + 𝑠 − 𝑧𝑖 )} − πœ€. Therefore, πΉπ‘Ž (π‘₯π‘˜ ) =
min𝑠∈𝑅𝑛 {π‘“π‘Ž (π‘₯π‘˜ + 𝑠) + (2πœ†)−1 β€– 𝑠‖2 } ≥ πΉπ‘’π‘Ž (π‘₯π‘˜ ) − πœ€.
Let
π‘Ž (π‘₯π‘˜ ) = πΉπ‘Ž (π‘₯π‘˜ ) − πΉπ‘Ž (π‘₯π‘˜ ) .
π‘Ž
π‘˜
(23)
π‘˜
We accept 𝑝 (π‘₯ ) as an approximation of 𝑝(π‘₯ ) based on the
following rule:
𝑇
π‘Ž (π‘₯π‘˜ ) < π‘š (π‘₯π‘˜ ) min {πœ†−2 𝑠(π‘₯π‘˜ ) 𝑠 (π‘₯π‘˜ ) , 𝐿} ,
(24)
where π‘š(π‘₯π‘˜ ) and 𝐿 are given positive numbers and π‘š(π‘₯π‘˜ ) is
fixed during one bundling process; that is, π‘š(π‘₯π‘˜ ) depends on
π‘₯π‘˜ , see Step 1 in AQNBT algorithm presented in Section 3.
If (24) is not satisfied, we let 𝑧𝑗+1 = π‘₯π‘˜ + 𝑠(π‘₯π‘˜ ) and πœ€π‘—+1 =
π‘Ž π‘˜
π›Ύπœ€ , 0 < 𝛾 < 1, and take 𝑓̃𝑗+1 = 𝑓̃𝑝 (π‘₯ ) and π‘”π‘Ž (𝑧𝑗+1 , πœ€ ) ∈
𝑗
𝑅𝑛 satisfying
𝑗+1
𝑓 (𝑧𝑗+1 ) ≥ 𝑓̃𝑗+1 ≥ 𝑓 (𝑧𝑗+1 ) − πœ€π‘—+1 ,
𝑓 (𝑧) ≥ 𝑓̃𝑗+1 + βŸ¨π‘”π‘Ž (𝑧𝑗+1 , πœ€π‘—+1 ) , 𝑧 − 𝑧𝑗+1 ⟩ ,
∀𝑧 ∈ 𝑅𝑛 ,
(25)
and then append a new piece 𝑓̃𝑗+1 + π‘”π‘Ž (𝑧𝑗+1 , πœ€π‘—+1 )𝑇 (π‘₯π‘˜ + 𝑠 −
𝑧𝑗+1 ) to (14), replace 𝑗 by 𝑗+1, and solve (19) for finding a new
𝑠(π‘₯π‘˜ ) and π‘Ž(π‘₯π‘˜ ) to be tested in (24). If this bundle process does
not terminate, we have the following conclusion.
(P4) Suppose that π‘₯π‘˜ is not the minimizer of 𝑓. If (24) is
never satisfied, then π‘Ž(π‘₯π‘˜ ) → 0 as the new point 𝑧𝑗+1
is appended into the bundle π½π‘˜ infinitely.
Suppose that |π½π‘˜ | = |{1, 2, . . . , 𝑗}| = 𝑗 < 𝐽max . Define the
functions πœ™ and πœ‘π‘—+1 , 𝑗 = 1, 2, . . . by
σ΅„©
σ΅„©2
πœ™ (𝑧) = 𝑓 (𝑧) + (2πœ†)−1 󡄩󡄩󡄩󡄩𝑧 − π‘₯π‘˜ σ΅„©σ΅„©σ΅„©σ΅„© ,
πœ‘π‘—+1 (𝑧) =
max
𝑖∈π½π‘˜ ={1,2,...,𝑗}
𝑇
{𝑓̃𝑖 + π‘”π‘Ž (𝑧𝑖 , πœ€π‘– ) (𝑧 − 𝑧𝑖 )}
(26)
σ΅„©
σ΅„©2
+ (2πœ†)−1 󡄩󡄩󡄩󡄩𝑧 − π‘₯π‘˜ σ΅„©σ΅„©σ΅„©σ΅„© .
Let 𝑧𝑗+1 be the unique minimizer of min𝑧∈𝑅𝑛 πœ‘π‘—+1 (𝑧), and
let 𝑧𝑗+2 be the unique minimizer of min𝑧∈𝑅𝑛 πœ‘π‘—+2 (𝑧), where
πœ‘π‘—+2 (𝑧) = max𝑖∈π½π‘˜+1 {𝑓̃𝑖 + π‘”π‘Ž (𝑧𝑖 , πœ€π‘– )𝑇 (𝑧 − 𝑧𝑖 )} + (2πœ†)−1 β€– 𝑧 −
π‘₯π‘˜ β€–2 . Note that if |{1, 2, . . . , 𝑗 + 1}| = 𝑗 + 1 < 𝐽max , then
let π½π‘˜+1 = {1, 2, . . . , 𝑗 + 1}, so πœ‘π‘—+1 (𝑧𝑗+1 ) ≤ πœ‘π‘—+2 (𝑧𝑗+2 ); if
|{1, 2, . . . , 𝑗 + 1}| = 𝑗 + 1 = 𝐽max , delete at least two elements
from {1, 2, . . . , 𝑗 + 1}, say π‘ž1 , π‘ž2 , and π‘ž1 =ΜΈ 𝑗 + 1, π‘ž2 =ΜΈ 𝑗 + 1,
the order of the other elements in {1, 2, . . . , 𝑗 + 1} are left
intact. Introduce an additional index Μƒπ‘˜ associated with the
aggregated πœ€-subgradient and linearization error into π½π‘˜+1
and let π½π‘˜+1 = {1, 2, . . . , π‘ž1 −1, π‘ž1 +1, . . . , π‘ž2 −1, π‘ž2 +1, . . . , Μƒπ‘˜, 𝑗+
1}, so |π½π‘˜+1 | = 𝑗 < 𝐽max . By adjusting πœ† appropriately,
we can make sure that 𝑧𝑗+1 and 𝑧𝑗+2 are not far away from
π‘₯π‘˜ . According to the proof of Proposition 3, see Fukushima
[21], we find that πœ™(𝑧𝑗 ) has limit, say πœ™∗ , and πœ‘π‘—+1 (𝑧𝑗+1 ) also
converges to πœ™∗ as 𝑗 → ∞. By the definitions of 𝐹(π‘₯π‘˜ ) and
πΉπ‘Ž (π‘₯π‘˜ ) we have πΉπ‘Ž (π‘₯π‘˜ ) → 𝐹(π‘₯π‘˜ ) and πΉπ‘Ž (π‘₯π‘˜ ) → 𝐹(π‘₯π‘˜ ) as
𝑗 → ∞, so π‘Ž(π‘₯π‘˜ ) → 0 as 𝑗 → ∞.
In the next part we give the definition of πΊπ‘Ž (π‘₯π‘˜ ), which is
the approximation of 𝐺(π‘₯π‘˜ ),
πΊπ‘Ž (π‘₯π‘˜ ) = πœ†−1 (π‘₯π‘˜ − π‘π‘Ž (π‘₯π‘˜ )) = −πœ†−1 𝑠 (π‘₯π‘˜ ) ,
(27)
and some properties of πΊπ‘Ž (π‘₯π‘˜ ) are discussed. It is easy to see
that the approximation of 𝐺(π‘₯π‘˜ ) is associated with 𝐹(π‘₯π‘˜ ):
(P5) ‖𝐺(π‘₯π‘˜ ) − πΊπ‘Ž (π‘₯π‘˜ )β€– = β€–πœ†−1 (𝑝(π‘₯π‘˜ ) − π‘π‘Ž (π‘₯π‘˜ ))β€– ≤
√2π‘Ž(π‘₯π‘˜ )/πœ†.
By the strong convexity of πœ™(𝑧), we have πœ™(π‘π‘Ž (π‘₯π‘˜ )) ≥
πœ™(𝑝(π‘₯π‘˜ )) + (2πœ†)−1 β€– 𝑝(π‘₯π‘˜ ) − π‘π‘Ž (π‘₯π‘˜ )β€–2 . From the definitions
π‘Ž π‘˜
of πΉπ‘Ž (π‘₯π‘˜ ) and 𝑝(π‘₯π‘˜ ), we obtain πΉπ‘Ž (π‘₯π‘˜ ) = 𝑓̃𝑝 (π‘₯ ) + πœ€π‘π‘Ž (π‘₯π‘˜ ) +
(2πœ†)−1 β€– π‘π‘Ž (π‘₯π‘˜ ) − π‘₯π‘˜ β€–2 ≥ 𝑓(π‘π‘Ž (π‘₯π‘˜ )) + (2πœ†)−1 β€– π‘π‘Ž (π‘₯π‘˜ ) − π‘₯π‘˜ β€–2 =
πœ™(π‘π‘Ž (π‘₯π‘˜ )) ≥ πœ™(𝑝(π‘₯π‘˜ )) + (2πœ†)−1 β€– 𝑝(π‘₯π‘˜ ) − π‘π‘Ž (π‘₯π‘˜ )β€–2 = 𝐹(π‘₯π‘˜ ) +
(2πœ†)−1 β€– 𝑝(π‘₯π‘˜ ) − π‘π‘Ž (π‘₯π‘˜ )β€–2 . By (P1), (P5) holds.
By (P4) and (P5), we have the following (P6). In fact,
(P6) says that the bundle subalgorithm for finding 𝑠(π‘₯π‘˜ )
terminates in finite steps.
Abstract and Applied Analysis
5
(P6) If π‘₯π‘˜ does not minimize 𝑓, then we can find one
solution 𝑠(π‘₯π‘˜ ) of (18) such that (24) holds.
Step 5 (updating π΅π‘˜ ). Let Δπ‘₯π‘˜ = π‘₯π‘˜+1 − π‘₯π‘˜ and Δπ‘”π‘˜ =
πΊπ‘Ž (π‘₯π‘˜+1 ) − πΊπ‘Ž (π‘₯π‘˜ ). Set
π΅π‘˜+1
3. Approximate Quasi-Newton
Bundle-Type Algorithm
𝑇
For presenting the algorithm, we use the following notations:
π‘Žπ‘˜ = π‘Ž(π‘₯π‘˜ ), π‘ π‘˜ = 𝑠(π‘₯π‘˜ ), and π‘šπ‘˜ = π‘š(π‘₯π‘˜ ). Given positive
numbers 𝛿, 𝜐, 𝛾, and 𝐿 such that 0 < 𝛿 < 1, 0 < 𝜐 < 1,
0 < 𝛾 < 1, and one symmetric 𝑛 × π‘› positive definite matrix
𝑁.
Approximate Quasi-Newton Bundle-Type Algorithm
(AQNBT Alg):
Step 1 (initialization). Let π‘₯1 be a starting point, and let 𝐡1
be an 𝑛 × π‘› symmetric positive definite matrix. Let πœ€1 and πœ†
be positive numbers. Choose a sequence of positive numbers
∞
1
𝑛
{π‘šπ‘˜ }∞
π‘˜=1 such that ∑π‘˜=1 π‘šπ‘˜ < ∞. Set π‘˜ = 1. Find 𝑠 ∈ 𝑅 and
π‘Ž1 such that
𝑇
π‘Ž1 ≤ π‘š1 min {πœ†−2 (𝑠1 ) 𝑠1 , 𝐿} .
(28)
Let πΊπ‘Ž (π‘₯1 ) = −πœ†−1 𝑠1 , 𝑧1 = π‘₯1 , 𝑗 = 1, and 𝑗 be the running
index of bundle subalgorithm.
Step 2 (finding a search direction). If β€–πΊπ‘Ž (π‘₯π‘˜ )β€– = 0, stop with
π‘₯π‘˜ optimal. Otherwise compute
if (Δπ‘₯π‘˜ ) Δπ‘”π‘˜ ≤ 0,
{
{𝑁,
= {(symmetric, positive definite
{
k
k
{ and satisfies Bk+1 Δx = Δg ) otherwise.
(32)
Set π‘˜ = π‘˜ + 1, and go to Step 2.
End of AQNBT algorithm.
4. Convergence Analysis
In this section we prove the global convergence of the
algorithm described in Section 3, and furthermore under the
assumptions of semismoothness and regularity, we show that
the proposed algorithm has a Q-superlinear convergence rate.
Following the proof of Theorem 3, see Mifflin et al. [7], we can
show that, at each iteration π‘˜, π‘–π‘˜ is well defined, and hence
the stepsize π‘‘π‘˜ > 0 can be determined finitely in Step 4. We
assume the proposed algorithm does not terminate in finite
steps, so the sequence {π‘₯π‘˜ }∞
π‘˜=1 is an infinite sequence. Since
∞
the sequence {π‘šπ‘˜ }∞
π‘˜=1 satisfies ∑π‘˜=1 π‘šπ‘˜ < ∞, there exists a
∞
constant π‘Š such that ∑π‘˜=1 π‘šπ‘˜ ≤ π‘Š. Let π·π‘Ž = {π‘₯ ∈ 𝑅𝑛 |
𝐹(π‘₯) ≤ 𝐹(π‘₯1 ) + 2πΏπ‘Š}. By making a slight change of the
proof of Lemma 1, see Mifflin et al. [7], we have the following
lemma.
(29)
Lemma 1. 𝐹(π‘₯π‘˜+1 ) ≤ 𝐹(π‘₯π‘˜ )+𝐿(π‘šπ‘˜ +π‘šπ‘˜+1 ) π‘“π‘œπ‘Ÿ π‘Žπ‘™π‘™ π‘˜ ≥ 1 and
π‘₯π‘˜ ∈ π·π‘Ž .
Step 3 (line search). Starting with 𝑒 = 1, let π‘–π‘˜ be the smallest
nonnegative integer 𝑒 such that
Theorem 2. Suppose 𝑓 is bounded below and there exists a
constant 𝛽 such that
π‘‘π‘˜ = −π΅π‘˜−1 πΊπ‘Ž (π‘₯π‘˜ ) .
π‘˜
𝑒 π‘˜
π‘Ž
π‘˜
𝑒
π‘˜ 𝑇
π‘Ž
π‘˜
πΉπ‘Ž (π‘₯ + 𝜐 𝑑 ) ≤ 𝐹 (π‘₯ ) + π›Ώπœ (𝑑 ) 𝐺 (π‘₯ ) ,
(30)
where πœ€π‘’+1 = π›Ύπœ€π‘’ corresponds to the approximations πΉπ‘Ž (π‘₯π‘˜ +
πœπ‘’ π‘‘π‘˜ ) and πΉπ‘Ž (π‘₯π‘˜ + πœπ‘’ π‘‘π‘˜ ) of 𝐹 at π‘₯π‘˜ + πœπ‘’ π‘‘π‘˜ ; πΉπ‘Ž (π‘₯π‘˜ + πœπ‘’ π‘‘π‘˜ )
satisfies
πΉπ‘Ž (π‘₯π‘˜ + πœπ‘’ π‘‘π‘˜ ) − πΉπ‘Ž (π‘₯π‘˜ + πœπ‘’ π‘‘π‘˜ )
𝑇
≤ π‘šπ‘˜+1 min {πœ†−2 𝑠(π‘₯π‘˜ + πœπ‘’ π‘‘π‘˜ ) 𝑠 (π‘₯π‘˜ + πœπ‘’ π‘‘π‘˜ ) , 𝐿} ,
(31)
and 𝑠(π‘₯π‘˜ +πœπ‘’ π‘‘π‘˜ ) is the solution of (19), in which π‘₯π‘˜ is replaced
by π‘₯π‘˜ + πœπ‘’ π‘‘π‘˜ , and the expression of πΉπ‘Ž (π‘₯π‘˜ + πœπ‘’ π‘‘π‘˜ ) is similar
to (21), but π‘₯ is replaced by π‘₯π‘˜ + πœπ‘’ π‘‘π‘˜ . Set π‘‘π‘˜ = πœπ‘–π‘˜ and π‘₯π‘˜+1 =
π‘₯π‘˜ + π‘‘π‘˜ π‘‘π‘˜ .
Step 4 (computing the approximate gradient). Compute
πΊπ‘Ž (π‘₯π‘˜+1 ) = −πœ†−1 π‘ π‘˜+1 .
βŸ¨π΅π‘˜ 𝑑, π‘‘βŸ© ≥ 𝛽‖𝑑‖2 ,
∀𝑑 ∈ 𝑅𝑛 , ∀π‘˜.
(33)
Then any accumulation point of {π‘₯π‘˜ } is an optimal solution of
problem (1).
Proof. According to the first part of the proof of Theorem 3,
see Mifflin et al., [7], we have limπ‘˜ → ∞ 𝐹(π‘₯π‘˜ ) = 𝐹∗ . Since
π‘šπ‘˜ → 0, from (P1) we obtain π‘Žπ‘˜ → 0 as π‘˜ → ∞, and
limπ‘˜ → ∞ πΉπ‘Ž (π‘₯π‘˜ ) = limπ‘˜ → ∞ πΉπ‘Ž (π‘₯π‘˜ ) = 𝐹∗ . Thus
𝑇
lim π‘‘π‘˜ (π‘‘π‘˜ ) πΊπ‘Ž (π‘₯π‘˜ ) = 0.
π‘˜→∞
(34)
Let π‘₯ be an arbitrary accumulation point of {π‘₯π‘˜ }, and let
{π‘₯π‘˜ }π‘˜∈𝐾 be a subsequence converging to π‘₯. By (P5) we have
lim
π‘˜∈𝐾,π‘˜ → ∞
πΊπ‘Ž (π‘₯π‘˜ ) = 𝐺 (π‘₯) .
(35)
Since {π΅π‘˜−1 } is bounded, we may suppose
lim
π‘‘π‘˜ = 𝑑
π‘˜∈𝐾,π‘˜ → ∞
(36)
6
Abstract and Applied Analysis
for some 𝑑 ∈ 𝑅𝑛 . Moreover we have
lim
π‘˜∈𝐾,π‘˜ → ∞
(ii) limπ‘˜ → ∞ dist(π΅π‘˜ , πœ•π΅ 𝐺(π‘₯π‘˜ )) = 0,
σ΅„© σ΅„©2
βŸ¨πΊπ‘Ž (π‘₯π‘˜ ) , π‘‘π‘˜ ⟩ = ⟨𝐺 (π‘₯) , π‘‘βŸ© ≤ −𝛽󡄩󡄩󡄩󡄩𝑑󡄩󡄩󡄩󡄩 .
(37)
If lim inf π‘˜ → ∞ π‘‘π‘˜ > 0, then 𝑑 = 0. Otherwise, if
lim inf π‘˜ → ∞ π‘‘π‘˜ = 0, by taking a subsequence if necessary we
may assume π‘‘π‘˜ → 0 for π‘˜ ∈ 𝐾. The definition of π‘–π‘˜ in the line
search rule gives
𝑇
πΉπ‘Ž (π‘₯π‘˜ + Vπ‘–π‘˜ −1 π‘‘π‘˜ ) > πΉπ‘Ž (π‘₯π‘˜ ) + 𝛿Vπ‘–π‘˜ −1 (π‘‘π‘˜ ) πΊπ‘Ž (π‘₯π‘˜ ) ,
(38)
where Vπ‘–π‘˜ −1 = π‘‘π‘˜ /V. So by (P1) we obtain
𝐹 (π‘₯π‘˜ + πœπ‘–π‘˜ −1 π‘‘π‘˜ ) − 𝐹 (π‘₯π‘˜ )
πœπ‘–π‘˜ −1
𝑇
> 𝛿(π‘‘π‘˜ ) πΊπ‘Ž (π‘₯π‘˜ ) .
𝑇
𝑑 𝐺 (π‘₯) ≥ 𝛿𝑑 𝐺 (π‘₯) .
(39)
(40)
In view of (37), the last inequality also gives 𝑑 = 0. Since
πΊπ‘Ž (π‘₯) = −π΅π‘˜ π‘‘π‘˜ and π΅π‘˜ is bounded, it follows from 𝑑 = 0 that
πΊπ‘Ž (π‘₯π‘˜ ) = 𝐺 (π‘₯) = 0.
lim
π‘˜ → ∞,π‘˜∈𝐾
(41)
Therefore, π‘₯ is an optimal solution of problem (1).
In the next part, we focus our attention on establishing
Q-superlinear convergence of the proposed algorithm.
Theorem 3. Suppose that the conditions of Theorem 2 hold
and π‘₯ is an optimal solution of (1). Assume that 𝐺 is BD-regular
at π‘₯. Then π‘₯ is the unique optimal solution of (1) and the entire
sequence {π‘₯π‘˜ } converges to π‘₯.
Proof. By the convexity and BD-regularity of 𝐺 at π‘₯, π‘₯ is
the unique optimal solution of (3); for the proof, see Qi and
Womersley [22]. So π‘₯ is also the unique optimal solution
of (1). This implies that both 𝑓 and 𝐹 must have compact
level sets. By Lemma 1 {π‘₯π‘˜ } has at least one accumulation
point, and from Theorem 2 we know this accumulation
point must be π‘₯ since π‘₯ is the unique solution of (1). Next
following the proof of Theorem 5.1, see Fukushima and Qi
[1], we can prove that the entire sequence {π‘₯π‘˜ } converges to
π‘₯.
The condition that the Lipschitz continuous gradient 𝐺
of 𝐹 is semismooth at the unique optimal solution of (1) is
required in the next theorem. This condition is identified if 𝑓
is the maximum of several affine functions or 𝑓 satisfies the
constant rank constraint qualification.
Theorem 4. Suppose that the conditions of Theorem 3 hold
and 𝐺 is semismooth at the unique optimal solution π‘₯ of (1).
Suppose further that
2
(i) π‘Žπ‘˜ = π‘œ(‖𝐺(π‘₯π‘˜ )β€– ),
Then {π‘₯π‘˜ } converges to π‘₯ Q-superlinearly.
Proof. Firstly we have {π‘₯π‘˜ } converges to π‘₯ by Theorem 3. Then
by condition (i) and (P5), we have
σ΅„©
σ΅„©σ΅„© π‘Ž π‘˜
󡄩󡄩𝐺 (π‘₯ ) − 𝐺 (π‘₯π‘˜ )σ΅„©σ΅„©σ΅„©
σ΅„©
σ΅„©
σ΅„©
σ΅„©
σ΅„©
σ΅„©
σ΅„©σ΅„©
σ΅„©σ΅„©
= 𝑂 (σ΅„©σ΅„©√π‘Žπ‘˜ σ΅„©σ΅„©) = π‘œ (󡄩󡄩󡄩󡄩𝐺 (π‘₯π‘˜ )σ΅„©σ΅„©σ΅„©σ΅„©) = π‘œ (σ΅„©σ΅„©σ΅„©σ΅„©π‘₯π‘˜ − π‘₯σ΅„©σ΅„©σ΅„©σ΅„©) .
(42)
By condition (ii), there is a π΅π‘˜ ∈ πœ•π΅ 𝐺(π‘₯π‘˜ ) such that
By taking the limit in (39) on the subsequence π‘˜ ∈ 𝐾, we have
𝑇
(iii) π‘‘π‘˜ ≡ 1, for all large π‘˜.
σ΅„©
σ΅„©σ΅„©
σ΅„©σ΅„©π΅π‘˜ − π΅π‘˜ σ΅„©σ΅„©σ΅„© = π‘œ (1) .
σ΅„©
σ΅„©
(43)
Since 𝐺 is semismooth at π‘₯, we have, according to Qi and Sun
[13],
σ΅„©
σ΅„©σ΅„©
σ΅„©
σ΅„©
󡄩󡄩𝐺 (π‘₯π‘˜ ) − 𝐺 (π‘₯) − π΅π‘˜ (π‘₯π‘˜ − π‘₯)σ΅„©σ΅„©σ΅„© = π‘œ (σ΅„©σ΅„©σ΅„©π‘₯π‘˜ − π‘₯σ΅„©σ΅„©σ΅„©) .
σ΅„©
σ΅„©
σ΅„©
σ΅„©
(44)
Notice that β€– π΅π‘˜−1 β€–= 𝑂(1), (42)–(44) and condition (iii), for
all large π‘˜, we have
σ΅„©
σ΅„©σ΅„© π‘˜+1
σ΅„©σ΅„©π‘₯ − π‘₯σ΅„©σ΅„©σ΅„©
σ΅„©
σ΅„©
σ΅„©σ΅„© π‘˜
σ΅„©
= σ΅„©σ΅„©σ΅„©π‘₯ − π‘₯ − π΅π‘˜−1 πΊπ‘Ž (π‘₯π‘˜ )σ΅„©σ΅„©σ΅„©σ΅„©
σ΅„©
= σ΅„©σ΅„©σ΅„©σ΅„©π΅π‘˜−1 [πΊπ‘Ž (π‘₯π‘˜ ) − 𝐺 (π‘₯π‘˜ ) + 𝐺 (π‘₯π‘˜ ) − 𝐺 (π‘₯)
σ΅„©
−π΅π‘˜ (π‘₯π‘˜ − π‘₯) + (π΅π‘˜ − π΅π‘˜ ) (π‘₯π‘˜ − π‘₯)]σ΅„©σ΅„©σ΅„©σ΅„©
(45)
σ΅„©σ΅„© −1 σ΅„©σ΅„© σ΅„©
σ΅„©
≥ σ΅„©σ΅„©σ΅„©π΅π‘˜ σ΅„©σ΅„©σ΅„© [σ΅„©σ΅„©σ΅„©σ΅„©πΊπ‘Ž (π‘₯π‘˜ ) − 𝐺 (π‘₯π‘˜ )σ΅„©σ΅„©σ΅„©σ΅„©
σ΅„©
σ΅„©
σ΅„©
σ΅„©
+ 󡄩󡄩󡄩󡄩𝐺 (π‘₯π‘˜ ) − 𝐺 (π‘₯) − π΅π‘˜ (π‘₯π‘˜ − π‘₯)σ΅„©σ΅„©σ΅„©σ΅„©
σ΅„©
σ΅„©σ΅„©
σ΅„©
+ σ΅„©σ΅„©σ΅„©σ΅„©π΅π‘˜ − π΅π‘˜ σ΅„©σ΅„©σ΅„©σ΅„© σ΅„©σ΅„©σ΅„©σ΅„©π‘₯π‘˜ − π‘₯σ΅„©σ΅„©σ΅„©σ΅„©] .
This establishes Q-superlinear convergence of {π‘₯π‘˜ } to π‘₯.
Condition (i) can be replaced by a more realistic condition π‘Žπ‘˜ = π‘œ(β€– 𝐺(π‘₯π‘˜−1 )β€–2 ) without impairing the convergence
result since π‘Žπ‘˜ is chosen before π‘₯π‘˜ is generated. For condition
(ii), Fukushima and Qi [1] suggest one of possible choices of
π΅π‘˜ , we may expect π΅π‘˜ to provide a reasonable approximation
to an element in πœ•π΅ 𝐺(π‘₯π‘˜ ), but it may be far from what we
should approximate. There are some approaches to overcome
this phenomenon, see Mifflin [10] and Qi and Chen [3]. For
condition (iii) we can make sure that if the conditions of
Theorem 4, except (iii), hold and 0 < 𝛿 < 1/2, then condition
(iii) holds automatically.
Acknowledgment
This research was partially supported by the National Natural
Science Foundation of China (Grants no. 11171049 and no.
11171138).
Abstract and Applied Analysis
References
[1] M. Fukushima and L. Qi, “A globally and superlinearly convergent algorithm for nonsmooth convex minimization,” SIAM
Journal on Optimization, vol. 6, no. 4, pp. 1106–1120, 1996.
[2] A. I. Rauf and M. Fukushima, “Globally convergent BFGS
method for nonsmooth convex optimization,” Journal of Optimization Theory and Applications, vol. 104, no. 3, pp. 539–558,
2000.
[3] L. Qi and X. Chen, “A preconditioning proximal Newton
method for nondifferentiable convex optimization,” Mathematical Programming, vol. 76, no. 3, pp. 411–429, 1997.
[4] Y. R. He, “Minimizing and stationary sequences of convex
constrained minimization problems,” Journal of Optimization
Theory and Applications, vol. 111, no. 1, pp. 137–153, 2001.
[5] R. Mifflin and C. Sagastizábal, “A VU-proximal point algorithm
for minimization,” in Numerical Optimization, Universitext,
Springer, Berlin, Germany, 2002.
[6] C. Lemaréchal, F. Oustry, and C. Sagastizábal, “The π‘ˆLagrangian of a convex function,” Transactions of the American
Mathematical Society, vol. 352, no. 2, pp. 711–729, 2000.
[7] R. Mifflin, D. Sun, and L. Qi, “Quasi-Newton bundle-type
methods for nondifferentiable convex optimization,” SIAM
Journal on Optimization, vol. 8, no. 2, pp. 583–603, 1998.
[8] J. B. Hiriart-Urruty and C. Lemaréchal, Convex Analysis and
Minimization Algorithms, Springer, Berlin, Germany, 1993.
[9] X. Chen and M. Fukushima, “Proximal quasi-newton methods
for nondifferentiable convex optimization,” Applied Mathematics Report 95/32, School of Mathematics, The University of New
South Wales, Sydney, Australia, 1995.
[10] R. Mifflin, “A quasi-second-order proximal bundle algorithm,”
Mathematical Programming, vol. 73, no. 1, pp. 51–72, 1996.
[11] K. C. Kiwiel, “Approximations in proximal bundle methods and
decomposition of convex programs,” Journal of Optimization
Theory and Applications, vol. 84, no. 3, pp. 529–548, 1995.
[12] M. Hintermüller, “A proximal bundle method based on approximate subgradients,” Computational Optimization and Applications, vol. 20, no. 3, pp. 245–266, 2001.
[13] R. Gabasov and F. M. Kirilova, Methods of Linear Programming,
Part 3, SpecialProblems, Izdatel’Stov BGU, Minsk, Belarus, 1980
(Russian).
[14] M. V. Solodov, “On approximations with finite precision in
bundle methods for nonsmooth optimization,” Journal of Optimization Theory and Applications, vol. 119, no. 1, pp. 151–165,
2003.
[15] K. C. Kiwiel, “An algorithm for nonsmooth convex minimization with errors,” Mathematics of Computation, vol. 45, no. 171,
pp. 173–180, 1985.
[16] P. Wolfe, “A method of conjugate subgradients for minimizing
nondifferentiable functions,” Mathematical Programming Study,
vol. 3, pp. 145–173, 1975.
[17] C. Lemaréchal, “An extension of davidon methods to non
differentiable problems,” Mathematical Programming Study, vol.
3, pp. 95–109, 1975.
[18] K. C. Kiwiel, A Variable Metric Method of Centres for Nonsmooth
Minimization, International Institute for Applied Systems Analysis, Laxemnburg, Austria, 1981.
[19] K. C. Kiwiel, Efficient algorithms for nonsmooth optimization
and their applications [Ph.D. thesis], Department of Electronics,
Technical University of Warsaw, Warsaw, Poland, 1982.
7
[20] R. Mifflin, “A modification and extension of Lemarechal’s
algorithm for nonsmooth minimization,” Mathematical Programming Study, vol. 17, pp. 77–90, 1982.
[21] M. Fukushima, “A descent algorithm for nonsmooth convex
optimization,” Mathematical Programming, vol. 30, no. 2, pp.
163–175, 1984.
[22] L. Q. Qi and R. S. Womersley, “An SQP algorithm for extended
linear-quadratic problems in stochastic programming,” Annals
of Operations Research, vol. 56, pp. 251–285, 1995.
Advances in
Operations Research
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Advances in
Decision Sciences
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Mathematical Problems
in Engineering
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Journal of
Algebra
Hindawi Publishing Corporation
http://www.hindawi.com
Probability and Statistics
Volume 2014
The Scientific
World Journal
Hindawi Publishing Corporation
http://www.hindawi.com
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
International Journal of
Differential Equations
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Volume 2014
Submit your manuscripts at
http://www.hindawi.com
International Journal of
Advances in
Combinatorics
Hindawi Publishing Corporation
http://www.hindawi.com
Mathematical Physics
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Journal of
Complex Analysis
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
International
Journal of
Mathematics and
Mathematical
Sciences
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com
Stochastic Analysis
Abstract and
Applied Analysis
Hindawi Publishing Corporation
http://www.hindawi.com
Hindawi Publishing Corporation
http://www.hindawi.com
International Journal of
Mathematics
Volume 2014
Volume 2014
Discrete Dynamics in
Nature and Society
Volume 2014
Volume 2014
Journal of
Journal of
Discrete Mathematics
Journal of
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com
Applied Mathematics
Journal of
Function Spaces
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Optimization
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Download