Hybrid extragradient proximal algorithm coupled with parametric approximation and penalty/barrier methods

advertisement
Hybrid extragradient proximal algorithm coupled with parametric approximation
and penalty/barrier methods
Miguel Carrasco
Abstract. In this paper we study the hybrid extragradient method coupled with approximation and penalty schemes
for minimization problems. Under certain hypotheses, that include for example the case of Tikhonov regularization,
we prove convergence of the method to the solution set of our minimization problem. When we use schemes
of penalization or barrier we can show convergence using the so called slow/fast parametrization hypothesis and
exploiting the existence and finite length of the central path. Assuming only finite length of the central path we
can prove the convergence of the scheme to a solution of the constrained minimization problem when the functions
belong to a special class of functions.
1. Introduction
Throughout the paper, H stands for a Hilbert space whose inner product (resp. norm) is denoted by h·, ·i (resp. k · k).
Let Γ0 (H) be the space of all the extended real valued, proper closed convex functions defined on H. For a function
f ∈ Γ0 (H), the δ-approximate subdifferential is defined by ∂δ f (x) := {g ∈ H | ∀y ∈ H, f (x)+hg, y − xi ≤ f (y)+δ},
δ ≥ 0. We denote ∂f for ∂0 f.
We want to solve the problem: find x̄ ∈ H solution of
(P )
min f (x).
x∈H
We denote by S(P ) the set of optimal solutions of (P ) and we assume it to be nonempty.
In order to solve (P ) we can use the classical proximal point algorithm, PPA for short: given x0 ∈ H, and a sequence
λk > 0, one solves iteratively
(1.1)
xk+1 − xk ∈ −λk ∂f (xk+1 )
which is equivalent to find
(1.2)
1 x − x k 2 .
xk+1 = argmin f (x) +
2λk
x∈H
1991 Mathematics Subject Classification. Primary: 90C25, 65K05; Secondary: 49M30, 90C51.
Key words and phrases. Parametric approximation; diagonal iteration; proximal point; hybrid method; global convergence.
1
2
MIGUEL CARRASCO
2
The iterates are well defined due to the strict convexity of · − xk .
The PPA was introduced for maximal monotone operators in the context of variational inequalities by Martinet
[26, 27] in the 70’s. Soon later, dealing also with maximal monotone operators Rockafellar [29] extended the
results of Martinet and also introduced approximate versions of (1.1). Calling P (xk ) the exact solution of (1.1) and
assuming that the procedure satisfies one of the following criteria
P (xk ) − xk+1 ≤ δk
or
P (xk ) − xk+1 ≤ δk xk+1 − xk ,
Rockafellar showed weak convergence of the iterates provided that
P
δk < ∞.
For the specific case of convex minimization, which we focus on here, the literature is vast, see for example [6, 20, 25]
and references therein.
We note that in each step of PPA we must solve a minimization problem that can be as difficult as the original
problem. In order to produce implementable methods it is absolutely necessary to deal with inexact versions of
PPA. For example, for convex minimization Auslender [6] replaced the equation (1.2) by the approximate version
1 x − x k 2
xk+1 ∈ δk - argmin f (x) +
2λk
x∈H
and Lemaire [24] took the equation (1.1) with ∂δ f in place of ∂f as follows
xk+1 − xk ∈ −λk ∂δk f (xk+1 ).
Both methods generate sequences that converges weakly to optimal solutions of (P ), under the hypotheses
P
∞ and λk stays bounded away from 0 (or the less restrictive condition
λk = ∞).
P
λk δ k <
Recently Solodov and Svaiter [31], in the context of finding zeroes of a maximum monotone operator T, and εenlargements of T (see [10] for details of ε-enlargements), combining the ideas of [30] and [10], proposed an inexact
version of PPA called hybrid extragradient method, which shares some features with the pure extragradient method
studied in [23]. The method consists of three steps. The first step is the proximal step where one finds z k satisfying
(1.3)
z k − xk + λk g k = ξ k ,
g k ∈ T (z k ).
Next, for σ ∈ [0, 1), one checks that the error ξ k satisfies one of the two following criteria
(1.4)
k
ξ ≤ σλk g k /2
Finally one makes the extragradient step
(1.5)
or
k
ξ ≤ σ x k − z k .
xk+1 = xk − λk g k .
It is proved in [31] the weak convergence of the hybrid extragradient method to a point in T −1 (0).
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
3
The hybrid extragradient method uses a relative error criterion which does not need the a priori summable hy
P
ξ k < ∞ of the residues ξ k , ussually made in the study of classical proximal algorithms dealing with
pothesis
P k 2
ξ < ∞.
errors. In contrast it can be proved a posteriori, in the case above, that
The hybrid extragradient method is extended in [22] to treat the case of a maximal hypomonotone operator, which
means an operator T : H ⇒ H that satisfies for some real number q ≥ 0
2
∀x, y ∈ H, gx ∈ T (x), gy ∈ T (y), we have hgx − gy , x − yi ≥ −q kx − yk
The convergence of the sequence generated by the extragradient algorithm to a zero of the operator T is showed.
Recently Solodov and Svaiter also considered in [30] an algorithm with relative error criterion where the next iterate
is computed by doing a projection onto a certain hyperplane.
In order to include explicit constraints or if we want to regularize the objective function, we can replace f in the
minimization problem (P ) by a family of better behaved functions. We can then consider a family of functions
(f (·, r))r>0 parametrized by r that belongs to Γ0 (H) and that converges to f as r goes to 0. We apply the scheme
(1.6)
xk+1 − xk
+ gk = ξ k ,
λk
g k ∈ ∂δk f (xk+1 , rk )
and at the end of the step we update rk by rk+1 ≤ rk .
Cominetti [13] studied the problem (1.6), with ξ k = 0, and showed convergence of (xk ) to a particular solution
of (P ) for several approximations schemes that include, for example, Tikhonov regularization and Penalty/Barrier
methods in linear programing. The proofs of convergence are based on regarding the behavior of the optimal path
defined by
x(r) := argmin f (x, r).
x∈H
Assuming finite length of the optimal path and when (rk ) goes to zero sufficiently fast or sufficiently slow in a
certain sense, the weak convergence of (xk ) to a point in S(P ) is proved.
Fast-Slow parametrization hypotheses impose a priori conditions on the way how the parameter r must go to
zero. For example in the case of the Log-Barrier in linear programing: fast and slow parametrization conditions
are complementary, which is what we want when we have an on-line rule to choose r k . Nevertheless in the case
of exponential penalty in linear programing there exist a gap between the slow/fast cases, that means there exists
some sequences (rk ) that do not satisfy the slow or fast parametrization hypotheses.
Alvarez et al. [3] studied the hybrid projection algorithm treated in [30] coupled with approximation methods
in convex programing, based on the work of Cominetti [13]. Similar results of convergence are obtained under
fast-slow parametrization hypotheses.
Cominetti and Courdurier [15] considered the scheme (1.6) in convex programing showing, under the hypothesis
P k
ξ < +∞, the convergence of the iterates to a point in S(P ) without using fast-slow hypotheses, but assuming
4
MIGUEL CARRASCO
that the objective functions and the constraints belong to the class of the so-called quasi-analytic functions (see
§4.2), for example linear functions, quadratic functions and strictly convex functions belong to this class.
Our main objective in this paper is to show the convergence of sequences generated by the hybrid extragradient
method coupled with different approximation schemes. We will see that extragradient methods allow us also to
weaken the assumption of summability of residues.
The paper is divided into three parts as follows: In the first part we give the tools that will be useful in the next
sections.
In the second part we study the convergence of the algorithm assuming certain hypotheses on the behavior of
(f (·, r)). We also extend the results to the case of Bregman functions following the works of [25, 32].
The third part deals with the penalty/barrier methods. Using the finite length of the central path and fast/slow
parametrization hypothesis we show weak converge of the iterates to a point in S(P ). Finally we study the convergence of the sequences generated by the Prox perturbed scheme:
xk+1 − xk
∈ −αk ∂δk f (xk+1 + ζ k , rk )
λk
P k 2
ζ < ∞.
under the assumptions αk greater than zero and
(1.7)
Based on [15] we show convergence of the sequence to a point in S(P ), provided that the family (f (·, r)) belongs
to a certain class of convex functions.
It is shown that equation (1.7) is equivalent to the extragradient method (see remark 2.1). Then as a corollary,
under hypotheses of finite length of the central path, we prove the asymptotic convergence of the extragradient
method to a point solution of the minimization problem (P ). Also choosing correctly αk and ζk we show the
equivalence between the scheme in (1.7) and the hybrid projection method studied in [3]. We obtain as a corollary
the convergence of the hybrid projection method, that extends the results of [3] (see Corollary 4.22, section §4.2).
2. Preliminaries
In this section and the following ones we consider two fixed sequences (λk ) with λk > 0 and (δk ) with δk ≥ 0. We
will assume that the parameters satisfy the following condition:
(2.1)
X
λk δk < ∞.
Let (fk ) be a sequence in Γ0 (H) and consider the following algorithm.
• Proximal step. Given xk ∈ H, λk > 0, δk ≥ 0 find z k ∈ H such that
(2.2)
(z k − xk ) + λk g k = ξ k , for some g k ∈ ∂δk fk (z k ),
• Error equation. The residue ξ k ∈ H is required to satisfy the following condition:
(2.3)
kξ k k ≤ σ
λk k
kg k
2
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
5
or
kξ k k ≤
(2.4)
√ k
σ z − x k ,
where σ ∈ [0, 1[ be the relative error tolerance.
• Extragradient step. If g k = 0 set xk = xk+1 otherwise
xk+1 = xk − λk g k .
(2.5)
Obviously we can use the less restrictive condition ξ k ≤ σ max{ λ2k g k , z k − xk } instead of (2.3) or (2.4), but
it seems to us more appropriate to take into account the features of each case.
We remember that our goal in this paper is to give conditions to ensure convergence of the iterates generated by
the method to a point solution of the minimization problem. We are also interested in conditions, maybe more
relaxed, allowing us to prove, for example, the convergence of the sequence of subgradients to 0 or the convergence
of the value function to the optimal value.
In [31] the authors shows, with an example in
2
, that the extragradient step is essential in order to have convergence
of the iterates. The example is the following: consider the scheme defined only by (1.3) and the second error criterion
in (1.4), with xk+1 in place of z k . Take T (x) = M x and xk+1 = Qxk where M and Q are defined by




0 −1
1 1
 , Q=
,
M =
1 0
−1 1
√
take λk = 1/2, σ ∈ (1/ 2, 1) and x0 6= (0, 0). It follows easily that equation (1.3) is equivalent to
1
ξk =
M Q + Q − I xk .
2
k+1
k
√ k
− xk = (Q − I)xk = xk and then the residue satisfies
It is easy to check that ξ ≤ 1/ 2 x and x
the second inequality in (1.4). Finally, to conclude, it is easy to see that the sequence is unbounded although the
operator T has a unique 0 at the origin.
In spite of the previous example one can think that for the optimization case the algorithm , (2.2)-(2.4) without
the extragradient step, might converge. However Gárciga et al. [17] provide an example, in infinite dimension,
with negative answer. They built a function based on Güller [19] that has minimizers but the hybrid extragradient
method applied to this function does not converge and also the iterates are unbounded. Nevertheless for the finite
dimensional case it was proved in Humes and Silva [21] that, for fk ≡ f for all k and λk bounded from below
away from 0, every cluster point of a sequence generated by (2.2)-(2.4) is a minimizer of f, and also we have the
convergence of the values to the optimum value.
Remark 2.1. We note that xk+1 can be written as xk+1 = z k − ξ k , see (2.2). Replacing z k = xk+1 + ξ k in the
same equation, (2.2), we obtain that the algorithm can be written equivalently as
(2.6)
xk+1 − xk
∈ −∂δk fk (xk+1 + ξ k ),
λk
6
MIGUEL CARRASCO
where ξ k satisfies (2.3) or (2.4). Therefore to study the properties of (2.2)-(2.5) or study the system above is
equivalent. We will return to this equation in §4.2 when we analyze the convergence of general penalty schemes.
Lemma 2.2. It is easy to see that in the case of (2.3) we have z k − xk ≤ ξ k + λk g k ≤
λk g k ≤ 2 z k − xk . In the same way we have the following estimates in the case of (2.4)
2−σ
λk g k ≤ ξ k + z k − xk ≤ (√σ + 1) z k − xk , and z k − xk ≤ 1√ λk g k .
1− σ
σ+2
2
λk g k , and
2.1. Convergence analysis. Let us start by establishing the following useful easy identity
Lemma 2.3. For any u, x, y, z ∈ H one has the equality
2
(2.7)
2
2
2
ky − uk = kx − uk − 2 hy − x, u − zi − kx − zk + kz − yk
Proof. By direct computations
2
2
ky − uk = kx − uk + 2 hy − x, x − ui + ky − xk
2
= kx − uk2 + 2 hy − x, z − ui + 2 hy − x, x − zi + ky − xk2
2
2
2
= kx − uk + 2 hy − x, z − ui − 2 kx − zk + 2 hy − z, x − zi + ky − xk .
2
2
2
Putting the identity ky − xk = kz − xk − 2 hy − z, x − zi + ky − zk in the last inequality above, one gets the
result.
We can stand the following lemma that will help us to show various convergence results.
Lemma 2.4. Let (xk )k∈
(2.8)
be a sequence generated by (2.2)-(2.5). Then: for every u ∈ H,
2
2
k+1
2 x
− u ≤ xk − u + 2λk δk + 2λk fk (u) − fk (z k ) − (1 − σ) E k where E k = λk g k if we use the first error criterion (2.3), and E k = xk − z k if we use (2.4).
Proof. Fix any u ∈ H, using Lemma 2.2 and the inclusion g k ∈ ∂δk fk (z k ), we have
(2.9)
k+1
2 2
2 2
x
− u = xk − u − 2 u − z k , xk+1 − xk − xk − z k + z k − xk+1 2
2 2
≤ xk − u + 2λk [fk (u) − fk (z k ) + δk ] − xk − z k + z k − xk+1 .
Considering (2.3) and replacing xk − z k by ξ k − λk g k and z k − xk+1 by ξ k , it is not difficult to see
(2.10)
2
k+1
2 2
x
− u ≤ xk − u + 2λk δk + 2λk [fk (u) − fk (z k )] − (1 − σ) λk g k .
If we use the second criterion (2.4), replacing z k − xk+1 by ξ k , one gets, like above, that
(2.11)
k+1
2 2
2
x
− u ≤ xk − u + 2λk δk + 2λk [fk (u) − fk (z k )] − (1 − σ) xk − z k .
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
7
A direct consequence of the previous lemma is the following theorem which stays the convergence of (x k ) when
fk ≡ f ∀k. The proof below is an adaptation of the one of Solodov and Svaiter [31] of Corollary 4.2, to approximate
subdifferentials (instead of maximal monotone operators) in the context of optimization. It allows us to obtain some
new properties like the last inequality in (2.12). An Analogous result can be found in [3] for the hybrid projection
method in optimization.
Theorem 2.5. Let (xk )k∈
be a sequence generated by (2.2)-(2.5). Then under (2.1):
(1) For every u ∈ argmin f, the sequence xk − u k∈
X
(2.12)
converges and further:
X
X
2
z k − xk 2 < ∞ and
λ2k g k < ∞,
λk [f (z k ) − min f ] < ∞.
(2) If dim H < ∞ and
P
λk = ∞ then (xk )k∈
converges to a point in argmin f .
(3) If dim H = ∞ and inf k λk > 0 then (xk )k∈
converges weakly to a point in argmin f.
Proof. Using Lemma 2.4, with f in place of fk and for u ∈ S = argmin f, we can easily obtain:
2
2 k+1
x
− u ≤ xk − u + 2λk δk .
(2.13)
As
P
λk δk < ∞ then, applying classical results on convergence of numerical sequences, its follows that xk+1 − u
converges. Going back to equation (2.8), we can deduce the following two inequalities
2 2
2λk [f (z k ) − min f ] ≤ xk − u − xk+1 − u + 2λk δk ,
(2.14)
and,
2
2 2
(1 − σ) kEk ≤ xk − u − xk+1 − u + 2λk δk .
(2.15)
Finally summing on k and using Lemma 2.2 the first item of the theorem holds.
λk = ∞ we can deduce from (2.14) that lim inf k f (z k ) = min f and then for
some subsequence of (z k ) we have limj f (z kj ) = min f. Its follow from (1) that (xk ) is bounded, using z k − xk → 0
In the case of dim H < ∞ using that
P
we have that (z k ) is bounded too. Taking subsequence again, if is necessary, we can assume that z kj → u for some
u ∈ H. By the lsc of f it follows that u ∈ S. Finally it is easy to see that if z kj → u then xkj → u too. Therefore,
as xk − u converges, we have the convergence of (xk ) to a point in S = argmin f.
In the case of dim H = ∞ the previous argument is no longer valid. In order to conclude we can follow the arguments
of [2] and apply the following result from Opial [2, 28].
Lemma 2.6 (Opial). Let H be a Hilbert space and (xk )k∈
a sequence in H such that there exists a nonempty,
closed and convex set C ⊂ H satisfying:
(a) For every x̄ ∈ C, limk xk − x̄ exists.
(b) If (xkj )j∈
converges weakly to x̂ then x̂ ∈ C.
Then, there exists x∞ ∈ C such that (xk )k∈
converges weakly to x∞ .
8
MIGUEL CARRASCO
We take C := S. The first item of Opial’s lemma have been proved yet. In order to prove the second we observe
that if xkj * x̂ then z kj * x̂ too. Using that inf k λk > 0 we have that limk f (z k ) ≤ min f. Then, by the lsc of f,
we can conclude that the second item of Opial lemma holds.
Remark 2.7. when dim H < ∞ we can replace the condition
P
λk = ∞ by the less restrictive criterion
P
λ2k = ∞
and continue having convergence of the method. To prove this statement the reader can follow the arguments of
[2] Remark 2.1, or [30] Remark 2.3.
Remark 2.8. Using the previous proposition and the error inequalities (2.3),(2.4) it is easy to see that a posteriori
the sequence of residues (ξk ) satisfies
However it might be occurred that
X 2
ξ k < ∞.
P k
ξ = ∞, see Gárciga et al. [17], and then the analysis of convergence can
not be obtained as a consequence of results doing with summable residues.
When fk 6≡ f we have to introduce new hypotheses in order to prove convergence.
3. First general result on global convergence
Assume that (xk )k∈N is a sequence satisfying (2.2)-(2.5). Let f ∈ Γ0 (H) with argmin f 6= ∅, we suppose that the
family of functions (fk )k∈N ⊂ Γ0 (H) satisfies the following conditions:
(3.1)
∀k ∈ N, ∀x ∈ H, f (x) ≤ fk (x).
(3.2)
∀k ∈ N, ∀x̄ ∈ argmin f, ∃ηk (x̄) ≥ 0: fk (x̄) ≤ min f + ηk (x̄).
Same kind of hypotheses are considered in Alvarez et al. [3] for the hybrid projection method and; Bahraoui and
Lemaire [8] for some proximal methods.
Theorem 3.1. Let (xk )k∈
be a sequence generated by (2.2)-(2.5). Assume that (3.1) and (3.2) hold, and that for
all u ∈ argmin f
X
(3.3)
λk ηk (u) < ∞.
Then:
(1) For every u ∈ argmin f, the sequence xk − u k∈
X
converges and
X
X
2
z k − xk 2 < ∞, and
λk [f (z k ) − min f ] < ∞.
λ2k g k < ∞,
(2) If dim H < ∞ and
P
λk = ∞ then (xk )k∈
converges to a point in argmin f .
(3) If dim H = ∞ and inf k λk > 0 then (xk )k∈
converges weakly to a point in argmin f.
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
9
Proof. Observe first that Lemma 2.4 holds and then it is easy to see that, for all u ∈ argmin f, (3.1) and (3.2)
yield
k+1
2 2
2
x
− u ≤ xk − u + 2λk δk + 2λk ηk (u) + 2λk [f (u) − f (z k )] − (1 − σ) E k ,
(3.4)
where E k is the same as Lemma 2.4. To conclude we use that
P
analogously to the proof of Theorem 2.5.
λk δk < ∞,
P
λk ηk < ∞, and we proceed
3.1. Convergence analysis using Bregman functions. We will extend our algorithm to include Bregman
functions, see Censor and Zenios [11]. Here H =
n
. Let C ⊆
n
and Dϕ (·, ·) be a Bregman distance, i.e.,
Dϕ (x, y) = ϕ(x) − ϕ(y) − h∇ϕ(y), x − yi
where ϕ is a Bregman function.
Definition 3.2. Recall that, given an open convex subset C of
ϕ: C →
n
, whose closure is denoted by C, one says that
is a Bregman function with zone C if
(1) ϕ is strictly convex and continuous in C;
(2) ϕ is continuously differentiable in C;
(3) for any x ∈ C and α ∈
the right partial level set
L(x, α) = {y ∈ C | Dϕ (x, y) ≤ α}
is bounded;
(4) if (z k ) is a sequence in C converging to x̄, then
lim Dϕ (x̄, z k ) = 0.
k→∞
Let us recall the following two results which will be used in the proof of Theorem 3.6.
Theorem 3.3 (see [32]). Let ϕ be a Bregman function with zone C. Let (z k ) be a sequence in C and (xk ) be a
sequence in C, such that
lim Dϕ (z k , xk ) = 0.
k→∞
If (z k ) or (xk ) converges, then the other sequence converges to the same limit.
Lemma 3.4 (Four-Point Lemma, see [12] ). Let ϕ be a Bregman function with zone C. For all x, z ∈ C and u, v ∈ C
we have
Dϕ (u, z) = Dϕ (u, x) + h∇ϕ(x) − ∇ϕ(z), u − vi + Dϕ (v, z) − Dϕ (v, x).
10
MIGUEL CARRASCO
Now let f ∈ Γ0 (H) with argminC f 6= ∅ and let a family of functions (fk )k∈N ⊂ Γ0 (H). We suppose that the
conditions (3.1), (3.2) are replaced by
(3.5)
∀k ∈ N, ∀x ∈ C, f (x) ≤ fk (x).
(3.6)
∀k ∈ N, ∀x̄ ∈ argmin f, ∃ηk (x̄) ≥ 0: fk (x̄) ≤ min f + ηk (x̄).
C
C
Additionally we will need the following assumptions that guarantees that the algorithm is well defined (see [32]).
(3.7)
C has nonempty interior and C ∩ dom ∂fk 6= ∅ ∀k ∈
(3.8)
∀x ∈ C, λ > 0, k ∈
.
, the generalized proximal problem:
0 ∈ λ ∂fk (·) + ∇ϕ(·) − ∇ϕ(x) has a solution.
∀x ∈ C, ∀y k ∈ C, If lim y k = y ∈ bdry C then lim ∇ϕ(y k ), y k − x = +∞.
(3.9)
k
k→∞
Now we are ready to introduce the hybrid extragradient method coupled with an approximation scheme in the
context of Bregman functions. Similar results were obtained by Lemaire [25] for a diagonal proximal method, see
equation (3.10).
Choose λ0 > 0, δ0 > 0, σ ∈ [0, 1), and x0 ∈ C.
• Proximal step. Given xk ∈ C, λk > 0, and δk ≥ 0, find g k ∈ H, z k ∈ C, and z ∈ H satisfying:
g k ∈ ∂δk fk (z k ) and
∇ϕ(z) − ∇ϕ(xk ) + λk g k = 0.
(3.10)
Remark 3.5. It follows, by the Hypothesis 3.9, that z ∈ C.
• Error equation. The value z ∈ H is required to satisfy the following condition:
Dϕ (z k , z) ≤ σDϕ (z k , xk ).
(3.11)
• Next step. If the error equation given by 3.11 is satisfied, then we put
xk+1 = z.
(3.12)
We can state the following theorem relative to the convergence of sequences generated by this algorithm.
Theorem 3.6. Let (xk )k∈
be a sequence generated by (3.10)-(3.12).
Assume that (3.5) and (3.6) hold, and that for all u ∈ argminC f
(3.13)
Then:
X
λk ηk (u) < ∞.
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
(1) For every u ∈ argminC f, the sequence Dϕ (u, xk )
X
(2) If
P
Dϕ (z k , xk ) < ∞ and
λk = ∞ then (xk )k∈
X
k∈
11
converges and
λk [f (z k ) − min f ] < ∞.
C
converges to a point in argminC f .
Proof. Fix any u ∈ S = argminC f, and put ηk = ηk (u). By Lemma 3.4 of four points we have
Dϕ (u, xk+1 ) ≤ Dϕ (u, xk ) + ∇ϕ(xk ) − ∇ϕ(z), u − z k + Dϕ (z k , z) − Dϕ (z k , xk )
(3.14)
≤ Dϕ (u, xk ) + λk g k , u − z k − (1 − σ)Dϕ (z k , xk )
≤ Dϕ (u, xk ) + λk [fk (u) − fk (z k )] + λk δk − (1 − σ)Dϕ (z k , xk )
≤ Dϕ (u, xk ) + λk [min f + ηk − fk (z k )] + λk δk − (1 − σ)Dϕ (z k , xk )
C
k
≤ Dϕ (u, x ) + λk [min f − f (z k )] + λk [δk + ηk ] − (1 − σ)Dϕ (z k , xk ),
C
the second inequality being due to (3.10)-(3.12) and the fourth one to (3.6). Then, using that z k ∈ C and
Dϕ (z k , xk ) ≥ 0, it follows that, for all k
Dϕ (u, xk+1 ) ≤ Dϕ (u, xk ) + λk [ηk + δk ].
The sequence
Dϕ (u, xk ) +
∞
X
λi [ηi + δi ]
i=k
is then decreasing and since it is bounded from below, it converges. This implies that Dϕ (u, xk ) converges too,
hence, by (3), in Definition 3.2, (xk ) is bounded. Using the convergence of the sequence Dϕ (u, xk ) , it follows
P
P
from (3.14) that
λk [f (z k ) − minC f ] < ∞ and
Dϕ (z k , xk ) < ∞. The first statement then holds.
Now let z kj be such that limj f (z kj ) = lim inf k f (z k ). As (xk ) is bounded it follows that (xkj ) is bounded too, and
hence we may assume that it converges to some x̄ ( taking subsequence if necessary). Using that D ϕ (z k , xk ) → 0
P
P
λk = ∞ and
λk [f (z k ) − minC f ] < ∞ it is easily seen that x̄ ∈
and using Theorem 3.3 we have z kj → x̄. As
argminC f. Taking then u = x̄ in the above part, we get that Dϕ (x̄, xk ) converges. As xkj → x̄ ∈ argminC f ⊆ C
we have (see (4) in Definition 3.2)
lim Dϕ (x̄, xk ) = lim Dϕ (x̄, xkj ) = 0
k
j
which implies, again by Theorem (3.3), xk → x̄.
Example 3.1 (Viscosity Method). Let C be an open, convex subset of
n
modeled by ϕ. Let f ∈ Γ0 (H) such
that argminC f is nonempty and bounded. Given h ∈ Γ0 (H), a strictly convex, continuous and coercive function
with inf H h = 0, and rk > 0, a small parameter intended to go to 0 as k goes to infinity, the viscosity method,
12
MIGUEL CARRASCO
see Attouch [4], consists in approximating f by fk (x) = f (x) + rk h(x). Clearly fk satisfies (3.5) and (3.6) with
ηk (u) = rk h(u). Finally in order to have 3.13 in Theorem 3.6 it is sufficient to require
X
(3.15)
k
rk λk < ∞.
Example 3.2 (Log-Exp approximation of minimax problems). Let C be an open, convex subset of
n
modeled by
ϕ. Let hi ∈ Γ0 (H), i = 1, . . . , m for m ≥ 2, and define
f (x) := max {hi (x)} ∈ Γ0 (H).
1≤i≤m
We are interested in finding a point in argminC f , which is supposed to be nonempty. In general, due to the max
operation, f is not smooth even if every hi is so, and this feature is an inconvenient for the direct application to f
of an optimization algorithm. One standard way to regularize f is to consider the log-exp approximation, see [9]:
!
m
X
exp[hi (x)/rk ] ,
fk (x) = rk log
i=1
which preserves the regularity of the data and satisfies f (x) ≤ fk (x) ≤ f (x) + rk log m for all x ∈ H and rk > 0.
Taking rk satisfying (3.15) we have (3.13) in Theorem 3.6.
4. Parametric approximation schemes
Let us consider the family of minimization problems of the type
(Pr )
v(r) = min{f (x, r) | x ∈ H},
x∈H
r > 0,
where each f (·, r) ∈ Γ0 (H), and define
Sr := argmin f (·, r).
In general, r > 0 is a small parameter intended to go to 0. We assume that:
(4.1)
(4.2)
There exists a function x : (0, r0 ] → H such that ∀r ∈ (0, r0 ], x(r) ∈ Sr .
Z r0 dx dr < ∞.
The optimal path x(·) is absolutely continuous on (0, r0 ] and
dr 0
Remark 4.1. The issue of the existence of the optimal path, for certain families of functions have already been
used, see [5, 13], and for example all viscosity methods satisfy this property, also some penalty and barrier methods
in linear programing have this feature. With respect to the absolutely continuity of the optimal path that has been
used in [3, 5, 13], one can establish it in many applications. For example, when f (·, r) is twice differentiable at
x(r), the property can be obtained by verifying the implicit function theorem. If the optimal path is also Lipschitz,
then the finite length condition holds provided that |dx/dr| is bounded above uniformly, see [13]. Finite length
condition can be checked in many interesting cases ( see Example 4.1), but this is not always true. In fact, Torralba
[33] constructed a counterexample where this hypothesis fails.
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
13
We assume also that the family (f (·, r))r>0 satisfies the following conditions:
There exists f ∈ Γ0 (H) such that f (y ∞ ) ≤ lim inf f (y j , rj ) for every rj & 0 and y j * y ∞ .
(4.3)
j→∞
lim v(r) = min f and there exists x∗ ∈ argmin f such that lim x(r) = x∗ .
(4.4)
r→0
r&0
Remark 4.2. The previous assumptions are satisfied by a large class of approximation schemes, like penalty barrier
methods in linear programing, see e.g. Alvarez [1] and Auslender et al. [7]; and viscosity methods, see Attouch [4].
For any sequence of positive numbers (rk )k∈
converging to 0, we consider the sequence (xk )k∈
algorithm (2.2)-(2.5) applied to (fk := f (., rk ))k∈
generated by the
Proposition 4.3. Assume that (4.1) and (4.2) hold. Given a sequence rk & 0 as k → ∞, let (xk )k∈
be a sequence
generated by (2.2)-(2.5) applied to (fk := f (·, rk ))k∈N , under (2.1). Then
(1) The real sequence (xk+1 − x(rk ))k∈N is convergent, then (xk ) is bounded.
P k
P
P 2 k 2
z − xk 2 < ∞ and
λk [f (z k , rk ) − v(rk )] < ∞.
(2)
λk g < ∞,
Proof. Like in the proof of Theorem 2.5, it is easy to show that, if we use (2.4) with x(rk ) in place of u,
(4.5)
2
2
k+1
2 x
− x(rk ) ≤ xk − x(rk ) + 2λk δk + 2λk [fk (x(rk )) − fk (z k )] − (1 − σ) E k .
Then we have (since the third term of the second member is non positive)
(4.6)
k+1
2
x
− x(rk ) ≤
k
x − x(rk−1 ) +
Z
rk−1
rk
2
dx dr + 2λk δk .
dr In order to conclude we recall the following result on convergence of numerical sequences (see [13], corollary 5.1).
Lemma 4.4. Let (µk ), (ζk ), (φk ) be non-negatives sequences s.t.
(φk + ζk )2 + µk . for all k ∈
P
µk < ∞,
. Then (φk ) converges as k goes to ∞.
P
ζk < ∞ and φk satisfies φ2k+1 ≤
k+1
Rr
Putting µk := 2λk δk , ζk := rkk−1 dx
− x(rk ) and using the lemma above we conclude that
dr dr, φk := x
k+1
x
− x(rk ) converges. The rest of the argument follows by summation in equation (4.5). We left the details to
the reader.
Using the previous proposition we can show also that if inf k λk > 0 then limk f (z k , rk ) = min f and all the weak
P
cluster points of (xk )k∈N belong to argmin f . If
λk = ∞ then there exists at least one weak cluster point of
(xk )k∈N that belongs to argmin f, but, we can’t conclude directly the convergence of the sequence (x k ), so we have
to introduce extra hypothesis on the family (f (·, rk )) to reach our objective.
4.1. Asymptotic convergence under fast/slow parametrization.
14
MIGUEL CARRASCO
4.1.1. Asymptotic convergence under fast parametrization. Let us reinforce (4.4) by assuming for the element
x∗ given by (4.4) that
∀r ∈ (0, r0 ], ∀x̄ ∈ argmin f, ∃ > 0, ∃η(x̄, r) ≥ 0: f (x(r) + (x̄ − x∗ ), r) ≤ v(r) + η(x̄, r),
(4.7)
which is a perturbed variant of (3.2) that was first introduced in [13].
Theorem 4.5. Suppose that the family (f (·, r))r>0 satisfies (4.1)-(4.4) and (4.7). Let (xk )k∈
be generated by
(2.2)-(2.5) applied to (fk := f (·, rk ))k∈N , under (2.1) for some sequence rk & 0 as k → ∞. If the following fast
parametrization condition holds:
(4.8)
∀x̄ ∈ argmin f,
then:
X
λk η(x̄, rk ) < ∞,
(1) For any x̄ in argmin f , limk xk − x̄ exists.
P
(2) If dim H < ∞ and
λk = ∞, the sequence (xk )k∈
converges to a point in argmin f.
(3) If dim H = ∞ and inf k λk > 0, the sequence (xk )k∈
converges weakly to a point in argmin f.
Proof. We know that xk+1 − x(rk ) converges and hence xk+1 − x∗ converges too, also
k+1
2 2
2
x
− x̄ = xk+1 − x∗ + kx∗ − x̄k + 2 xk+1 − x∗ , x∗ − x̄
So, in order to prove the first statement it must be proved that for any x̄ ∈ argmin f, the sequence xk+1 , x∗ − x̄
converges.
Proceeding analogously to [13], see also [3], we can show that
(4.9)
where φk
1 2
xk+1 − x(rk ), xk − xk+1 ≤
φk − φ2k+1 + M ζk ,
2
k
Rr
:= x − x(rk−1 ) , which is convergent by Proposition 4.3, ζk := rkk−1 dx
dr dr which is summable
according to (4.2) and M > φk , ∀k ∈
. On the other hand, if we use the first error criterion (2.3) and (4.7), we
easily see that for ηk := η(x̄, rk )
(4.10)
2
σ
xk − xk+1 , x(rk ) + (x̄ − x∗ ) − xk+1 ≤ λk (δk + ηk ) + λk g k .
2
If we use the second criterion (2.4) and (4.7), we can also see that
(4.11)
2
√ √
xk − xk+1 , x(rk ) + (x̄ − x∗ ) − xk+1 ≤ λk (δk + ηk ) + σ( σ + 1) z k − xk .
Its easy to see that the sequence (ak )k∈N defined by
X
ak := xk , x∗ − x̄ + 21 φ2k + M
ζj ,
j≥k
is bounded from below, because (φk ) converges. Using the above arguments and (4.9) it is clear that
ak+1 − ak ≤ xk − xk+1 , x(rk ) − (x̄ − x∗ ) − xk+1 .
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
In either case, using (4.10) or (4.11), we obtain that ak+1 ≤ ak + %k for certain %k > 0 with
15
P
%k < ∞ according
to (1) of Proposition 4.3. Then (ak ) converges, which proves the claim. The first item of the theorem hold.
Finally, to prove (2) and (3) we can use Proposition 4.3 and follows the arguments of Theorem 2.5 with fk in place
of f and where the lsc of f is replaced by the generalized property (4.3) of the family (f (·, r)), the details are left
to the reader.
4.1.2. Slow parametrization. We follow the ideas of [5, 13] by supposing that the family (f (·, r))r>0 satisfies
the following local strongly convex condition: for each r > 0 and for any bounded set K, there exists ω K (r) > 0
such that
2
(4.12)
f (z, r) + hg, y − zi + ωK (r) ky − zk ≤ f (y, r),
for all y, z ∈ K and g ∈ ∂f (z, r). Notice that it is allowed that ωK (r) → 0 as r → 0.
Theorem 4.6. Suppose that the family (f (·, r))r>0 satisfies(4.1)-(4.4) and (4.12). Let (xk )k∈
be generated by
(2.2)-(2.5) applied to (f (·, rk ))k∈N , for some sequence rk & 0 as k → ∞. If the following slow parametrization
condition holds:
X
(4.13)
λk ωK (rk ) = ∞,
then lim xk = x∗ = lim x(r).
r→0
k
Proof. Set fk := f (·, rk ) and φk := xk − x(rk−1 ) for all k ∈ N. By Proposition 4.3 (1), (φk )k∈
converges.
We claim that its limit is 0. On the one hand, the convergence of (φk )k∈ along with (4.4) implies that the sequence
k
(x )k∈
is bounded. On the other hand, Proposition 4.3 guarantees that (z k )k∈
is also bounded. Consequently,
there exists a bounded set K such that xk and z k belong to K for all k ∈ N. Let ωK be the modulus of strong
2
convexity associated with K, given by (4.12). As 0 ∈ ∂fk (x(rk )), we have ωK (rk ) z k − x(rk ) ≤ fk (z k )−fk (x(rk ))
and hence Lemma 2.4 with x(rk ) in place of u yields
and then
Putting ζk :=
2
2
2
φ2k+1 ≤ xk − x(rk ) − 2λk ωK (rk ) z k − x(rk ) − (1 − σ) E k + 2λk δk ,
2
2
φ2k+1 ≤ xk − x(rk ) − 2λk ωK (rk ) z k − x(rk ) + 2λk δk .
R rk−1 dx dr and using the same technique as in the proof of Proposition 4.3 we get
dr
rk
2
φ2k+1 ≤ φ2k + 2M ζk + ζk2 − 2λk ωK (rk ) z k − x(rk ) + 2λk δk ,
for some M ≥ 0 such that φk ≤ M for all k ∈ N. Then
(4.14)
2
2λk ωK (rk ) z k − x(rk ) ≤ φ2k − φ2k+1 + 2M ζk + ζk2 + 2λk δk .
16
MIGUEL CARRASCO
Observing that the inequality
Pk
j=1
φ2j − φ2j+1 = φ21 − φ2k+1 ≤ φ21 ensures the convergence of the series whose general
term is the second member of (4.14), we deduce that
X
2
λk ωK (rk ) z k − x(rk ) < ∞.
Under the slow parametrization assumption (4.13), we obtain that lim inf k z k − x(rk ) = 0. Further the ino
n√
2
z k − xk (see (2.2)-(2.4)) along with (2) in Proposition 4.3 gives
σ, 2−σ
equality xk+1 − z k = ξ k ≤ max
k+1
x
− z k → 0. Therefore, taking lower limit in the inequality
k+1
x
− x(rk ) ≤ xk+1 − z k + z k − x(rk )
allows us to conclude that lim inf k φk+1 = 0. As limk φk exists by (1) of Proposition 4.3, we conclude that lim φk = 0
and the proof is complete.
Finite length hypothesis can be weakened by supposing that the family satisfies the strongly convex condition in
the whole space, as we show in the following theorem
Theorem 4.7. Suppose that the family (f (·, r))r>0 satisfies (4.1), (4.3),(4.4) with (4.2) replaced by
(4.15)
The optimal path x(·) is absolutely continuous on (0, r0 ].
We assume also that the family satisfies (4.12), but with ω(r) as a global constant of strong convexity. Let (xk )k∈
be generated by (2.2)-(2.5) applied to (f (·, rk ))k∈N , for some sequence rk & 0 as k → ∞. If
X
and if in addition
λk ω(rk ) = ∞,
(1) for all k ≥ 0, λk ω(rk ) ≤ (1 − σ)/2σ,
Rr
Rr
(2) 0 0 |dx/dr| dr < ∞ or (1/λk ω(rk )) rkk−1 |dx/dr| dr → 0,
P
(3)
λk δk < ∞ or δk /ω(rk ) → 0,
then lim xk = x∗ = lim x(r).
k
r→0
Proof. Following an analogous reasoning of the precedent theorem, we obtain
(4.16)
2
2
2
φ2k+1 ≤ xk − x(rk ) − 2λk ω(rk ) z k − x(rk ) − (1 − σ) E k + 2λk δk .
On the other hand, we note that
2
2 φ2k+1 = xk+1 − z k + z k − x(rk ) + 2 xk+1 − z k , z k − x(rk )
2
2
2
2
≤ 2 xk+1 − z k + 2 z k − x(rk ) = 2 ξ k + 2 z k − x(rk ) .
Then the inequality above implies that
2 2 1
− z k − x(rk ) ≤ ξ k − φ2k+1 ,
2
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
17
and hence using this in inequality (4.16) we obtain
2
2
2
(1 + λk ω(rk ))φ2k+1 ≤ xk − x(rk ) + 2λk ω(rk ) ξ k − (1 − σ) E k + 2λk δk .
(4.17)
If we use the first error criterion, E k := λk g k , then
2
2
(1 + λk ω(rk ))φ2k+1 ≤ xk − x(rk ) − (1 − σ − σ 2 λk ω(rk )/2) E k + 2λk δk .
If we use the second error criterion, E k := z k − xk , it holds that
2
2
(1 + λk ω(rk ))φ2k+1 ≤ xk − x(rk ) − (1 − σ − 2σλk ω(rk )) E k + 2λk δk .
In either case, putting ζk :=
we obtain
R rk−1 dx dr and assuming that λk ω(rk ) ≤ (1 − σ)/(2σ), with a triangular inequality
dr
rk
(1 + λk ω(rk )) φ2k+1 ≤ φk + ζ k
(4.18)
2
+ 2λk δk .
The conclusion follows by applying the following result of convergence of numerical sequences, see Cominetti [13].
Lemma 4.8. Let (µk ), (νk ), (ζk ), (φk ) be non negatives sequences s.t.
(1)
(2)
P
P
ζk < ∞ or ζk /νk → 0.
P
νk = ∞ and ζk → 0. Suppose also that
µk < ∞ or µk /νk → 0.
If φk satisfies (1 + νk )φ2k+1 ≤ (φk + ζk )2 + µk for all k ∈
, then (φk ) converges to 0 as k goes to ∞.
Remark 4.9. We shall note that the hypothesis (1), in Theorem 4.7, is not so restrictive and, for example, in the
case of Tikhonov regularization it holds by taking λk bounded above, see Example 4.2.
Example 4.1 (Penalty/Barrier methods in linear programing). We consider the following linear program
minn {ct x | Ax ≤ b}.
(LP )
x∈
We will assume that the feasible set {x ∈
n
| Ax ≤ b} is bounded and has non empty interior. We consider the
following approximation scheme:
(LPr )
minn f (x, r) := ct x + r
x∈
X
θ([ati x − bi ]/r),
where ai denotes the rows of A and θ := exp(·), Exponential Penalty, or θ(y) := − log(−y), Log-Barrier. It is known
that in such cases the family (f (·, r)) satisfies the hypotheses 4.1-4.4 and that the optimal path is Lipschitz ( which
entails that it has finite length) see [18].
We can compute explicitly the parameters of fast/slow parametrization, see [13, 3]:
• Fast parametrization: η := α1 r when we use the Log-Barrier, for some α1 > 0.
η := α2 r exp(−β1 /r) when we use the Exponential Penalty, for some α2 , β1 > 0.
18
MIGUEL CARRASCO
• Slow parametrization: ω := α3 r when we use the Log-Barrier, for some α3 > 0.
ω := α4 exp(−β2 /r)/r when we use the Exponential Penalty, for some α4 , β2 > 0.
In the case of Log-Barrier the convergence of the method is ensured independently of the way how (r k ) goes to
zero. But, in the case of Exponential Penalty there exists some sequences, like r k = 1/ log k, that do not satisfy the
fast/slow hypothesis of Theorems 4.5 and 4.6.
Example 4.2 (Viscosity method II-continued). The family f (x, r) = f (x) + rh(x) satisfies the strong convexity
2
assumption when h is strongly convex. For the case of Tikhonov regularization, i.e. h(x) = kxk /2, it is known that
the optimal path converges strongly as r goes to 0 to the element of minimal norm in S(P ) = argminH f, also the
trajectory is absolutely continuous in (0, r0 ] but, we can not ensure the finite length of the optimal path because we
only have the estimate |dx/dr| ≤ O(1/r), see Attouch and Cominetti [5]. In Fact there exist some cases where this
hypothesis fails, see Torralba [33]; and so Theorem 4.6 can not be applied. Using Theorem 4.7 we can show as a
P
P
corollary that under
λk rk = ∞ and
λk δk < ∞ or δk /rk → 0 the sequence (xk ) generated by the extragradient
Rr
method applied to Tikhonov regularization satisfies the following: if 0 0 |dx/dr| dr < ∞ or log(rk−1 /rk )/λk rk → 0
then xk → x∗ = limr→0 x(r)
Remark 4.10. Theorems 4.5 and 4.6 can be adapted to cover the case of finding zeros of a maximal monotone
operator. Analogous hypotheses of fast/slow parametrization are used. The reader can easily adapt the scheme
used in [3] §5 to the extragradient method presented here.
4.2. Convergence of general penalty schemes without fast/slow techniques. Let I = {1, . . . , m}, and
let f0 , . . . , fm :
n
→
be convex functions.
The aim of this section is to study the following mathematical programing problem:
minn {f0 (x) | fi (x) ≤ 0, i ∈ I}
(4.19)
x∈
which is equivalent to (P ) by taking f := f0 + ψC where ψC correspond to the indicator function of the set
C := {x ∈
n
| fi (x) ≤ 0, i ∈ I}. We assume that S(P ), the set of solutions of the mathematical problem (P ), is
nonempty and bounded. In order to solve (P ) we consider the approximation scheme defined by
f (x, r) = f0 (x) + r
m
X
θ(fi (x)/r),
i=1
where θ : (−∞, κ) →
and κ ∈ [0, ∞). θ(·) is called a penalization function satisfying hypotheses of [15], that we
recall here:
(4.20a)
θ is smooth and convex
(4.20b)
θ0 (u) > 0 with θ0 (u) → 0 for u → −∞ and θ 0 (u) → ∞ for u → κ−
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
19
As it has seen above (see remark 2.1), the extragradient method is connected with the study of the limits points of
the following discretized dynamical system
xk+1 − xk
∈ −∂δk f (xk+1 + ξ k , rk ).
λk
(4.21)
Our main objective in this section is to prove the convergence of the sequences generated by (4.21) to a point
solution of the previous mathematical problem (4.19).
First we will start with some technical properties concerning the sequence (uk ) generated by (4.22) under some
assumptions related to some approximations of the set of solutions S(P ). These results which will allow us to
conclude the convergence of (xk ), generated by (4.21), to a point in argmin f without assuming slow-fast conditions
on the penalty parameter of the previous sections. Our proof is very close to that of [15]. We give the proof here
only for completeness. Recall first that for two bounded sets B, B 0 ⊆ H, the excess of B over B 0 is defined by
e(B, B 0 ) = supx∈B d(x, B 0 ).
Let ϕk : E →
∪ {∞} be a sequence of convex functions, E ⊂
n
an euclidean space and let αk ≥ 0, uk ∈ E
satisfying
uk+1 − uk
∈ −∂δk [αk ϕk ](uk+1 + ek )
λk
(4.22)
where
P k 2
e < ∞. equation 4.22 correspond to an implicit discretization of u̇ ∈ −∂[α(t)ϕt ](u(t) + e(t)) where
e(t) ∈ H tends to zero as t goes to infinity . It could be a very interesting issue to study the behavior of this
dynamical system but this is beyond the scope of this paper.
Theorem 4.11. Assume
P
λk αk = ∞ and
P
λk δk < ∞. Let S ⊆ E be nonempty and suppose, for small enough,
that there exists B and S , with S convex closed and bounded, such that
(1) e(B , S ) → 0 and e(S , S) → 0 when → 0.
(2) lim inf k mk () > 0 where mk () := inf{ϕk (x) − ϕk (y) | x ∈
/ B , y ∈ S }.
Then d(uk , S) → 0 as k goes to ∞.
Proof. Let wk = Proj(uk , S ), the projection of uk onto S . Taking x = uk , y = uk+1 , u = wk , and
z = z k = uk+1 + ek in Lemma 2.3 we have
k+1
2 2
u
− wk+1 ≤ uk+1 − wk 2
2 2
= uk − wk − 2 wk − z k , uk+1 − uk − uk − z k + z k − uk+1 k+1
2
k
u
− uk k
k
k 2
, w − z + ek ≤ u −w
+ 2λk −
λk
k
2
2
≤ u − wk + 2λk αk ϕk (wk ) − ϕk (z k ) + 2λk δk + ek ,
20
MIGUEL CARRASCO
2
2
The last inequality being due to (4.22). Putting Dk := uk − wk and putting ζk := 2λk δk + ek we may rewrite
the latter as
or, for Dk0 := Dk +
P∞
k
Dk+1 ≤ Dk + ζk + 2λk αk ϕk (wk ) − ϕk (z k )
ζi , as
0
Dk+1
≤ Dk0 + 2λk αk [ϕk (wk ) − ϕk (uk+1 + ek )].
Let k̄ large enough such that
> 0.
P∞
k̄
2
ζi ≤ and mk () ≥ d > 0 for all k ≥ k̄ for some d ≥ 0. Take a := [e(B , S ) + ] +
If uk+1 + ek − Proj(uk+1 + ek , S ) > e(B , S ) we have uk+1 + ek ∈
/ B hence
ϕk (uk+1 + ek ) − ϕk (wk ) ≥ mk () ≥ d > 0
and then
0
Dk+1
≤ Dk0 − 2λk αk d.
(4.23)
On the other hand, if uk+1 + ek − Proj(uk+1 + ek , S ) ≤ e(B , S ), using the inequality
Proj(uk+1 , S ) − Proj(uk+1 + ek , S ) ≤ ek we obtain
k+1
u
− wk+1 = uk+1 − Proj(uk+1 , S ) ≤ uk+1 − Proj(uk+1 + ek , S )
≤ uk+1 + ek − Proj(uk+1 + ek , S ) + ek ≤ e(B , S ) + 0
In either case Dk+1
satisfies, for all k large enough
0
Dk+1
≤ max {Dk0 − 2λk αk d , a } .
(4.24)
It is easy to see that there exists k̂ such that for all k ≥ k̂, Dk0 ≤ a . If not, we could assume that for all k ≥ k̄ we
P
have Dk0 > a , which is contradictory with the fact that
λk αk = ∞.
Finally, for all k large enough we have
d(uk , S) ≤ uk − wk + e(S , S)
(4.25)
(4.26)
≤
√
a + e(S , S)
we conclude letting → 0
Theorem 4.12. Assume
P
λk (αk + δk ) < ∞. Assume also that S ⊆ E is nonempty and that (uk ) is bounded and
ϕk (uk+1 + ek ) is bounded from below. Let ū be a cluster point of (uk ) for which there exists u ∈ S which satisfies
u → ū as → 0, with lim supk ϕk (u ) < ∞. Then uk → ū.
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
21
2
Proof. Getting Dk = uk − u and proceeding as before we obtain
2
Dk+1 ≤ Dk + 2λk αk [ϕk (u ) − ϕk (uk+1 + ek )] + 2λk δk + ek .
(4.27)
We know that uk is bounded, ϕk (uk + ek ) is bounded from below and lim supk ϕk (u ) < ∞. We may then find
M ≥ 0 such that
and as
P∞
k
2
Dk+1 ≤ Dk + λk αk M + 2λk δk + ek ,
2
λk (αk + δk ) + ek < ∞ we have that Dk converges. We finally taking the upper limit
lim sup kuk − ūk ≤ lim uk − u + ku − ūk = 2 ku − ūk
k
k
the conclusions follows letting → 0
With the main two above results of hand, we are in position to establish the main theorem of this section. First we
need to recall some definitions and results.
Definition 4.13 (see [14]). For D ⊆ I, we call the θ-Mean the function Mθ : (−∞, κ)|D| →
!
X
1
θ(xi ) .
Mθ (x) = θ−1
|D|
defined by
i∈D
We also define the asymptotic θ-Mean as Aθ : (−∞, κ)|D| →
Aθ (x) = lim rMθ (x/r).
r→0
We will assume that the limit Aθ always exist.
Example 4.3. The limit Aθ exists for the following penalization functions (see [14])
• θ(u) = exp(u), Aθ (x) = maxi∈D xi .
1 P
1 −1
• θ(u) = −1/(u) if u < 0, +∞ otherwise; Aθ (x) = [ |D|
.
i∈D xi ]
Q
• θ(u) = − ln(−u), if u < 0, +∞ otherwise; Aθ (x) = [ i∈D −xi ]1/|D| .
The function Aθ is, in general, convex, continuous, symmetric, positively homogeneous and component-wise non
decreasing, with
1 X
xi ≤ Aθ (x) ≤ max xi
i∈D
|D|
(4.28)
i∈D
Also Aθ can be extended from (−∞, 0)|D| to
|D|
−
keeping these properties, see [14]. We denote this extension by
Aθ , keeping the same notation. We also suppose that
(4.29)
∀x, y ∈
|D|
max xi 6= max yi ⇒ min Aθ (z) < max{Aθ (x), Aθ (y)}.
i∈D
i∈D
z∈[x,y]
22
MIGUEL CARRASCO
Remark 4.14. Of course for each function θ the θ-means and asymptotic θ-means depend on the set D ⊆ I. In
general we will use the notation Mθ (xi | i ∈ D) or Aθ (xi | i ∈ D), but when there is no ambiguity we will simply
write Mθ or Aθ respectively.
We define the class of quasi-analytic functions, Q as the class of convex functions f :
on the affine hull aff(C) of a convex set C ⊆
n
n
→
which are constant
wheneverf is constant on C, see [1, 15].
The class Q contains, for example, all the strictly convex functions, quadratic functions and analytic functions.
This class is closed by composing by linear maps and precomposing by strictly increasing, convex functions.
We will assume the following hypotheses on the functions f0 , . . . , fm
(4.30a)
if κ = 0 there is a Slater point x̂. i.e. max fi (x̂) < 0
(4.30b)
f0 , . . . , fm ∈ Q the class of quasi-analytic functions.
1≤i≤m
Definition 4.15. For each nonempty closed convex set C ⊆ S(P ) we let EC = aff(C) and IC = {i ∈ I |
fi is non-constant on C}. When IC is nonempty we set vC = min ϕC and SC = argmin ϕC where ϕC : EC →
∪ {∞} is given by
(4.31)
ϕC (u) =


Aθ (fi (u) | i ∈ IC )

+∞
if u ∈ C
otherwise
Lemma 4.16 (see §2 [15]). Let C ⊆ S(P ) be closed convex and nonempty.
(1) If IC 6= ∅ then there exists x̂ s.t. maxi∈IC fi (x̂) < 0.
(2) Under (4.30b), IC = ∅ if and only if C is a singleton.
(3) Let v j → v ∈
d
−
and rj → 0. Then Aθ (v) ≤ lim inf rj Mθ (v j /rj ) and when v ∈ (−∞, 0)d we have
Aθ (v) = lim rj Mθ (v j /rj ).
(4) Under (4.29), if IC 6= ∅ then SC is a proper subset of C and there exist β ≤ 0 and j ∈ IC with
fj (x) = maxi∈IC fi (x) = β for all x ∈ SC .
Lemma 4.17 (see Lemma 3.3 [15]). Let S := S(P ). Then for ∈ (0, 1) there exist nonempty and bounded closed
convex sets B , S in
n
such that
(1) e(B , S ) → 0 and e(S , S) → 0 when → 0
(2) lim inf k→∞ m(k) > 0 where m(k) := inf{f (x, rk ) − f (y, rk ) | x ∈
/ B , y ∈ S }.
Theorem 4.18. Let (xk ) be a sequence generated by the algorithm (4.21) where
P
ξ k 2 < ∞ and (f (·, r)) satisfies
(4.20a), (4.20b)(4.30a) and (4.30b). Then (xk ) converges to a point in S(P ) solution of the mathematical problem
(4.19).
Proof. The proof is a slight modification of the proof of [15]. We write it here for completeness.
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
23
Using Theorem 4.11, with uk := xk , αk = 1 and ϕk = f (·, rk ) we get d(xk , S(P )) → 0. Let C be the convex hull
of cluster points of xk . We claim that IC = ∅.
By contradiction, let us suppose that IC 6= ∅ and let us put S := SC . Assume EC is a linear space. Decomposing
xk+1 = uk+1 + v k+1
(4.32)
ξ k = ek + ek⊥
(4.33)
⊥
where uk+1 , ek ∈ EC and v k+1 , ek⊥ ∈ EC
, Obviously v k+1 , ek , ek⊥ → 0.
Fixing s ∈ ∂δk f (xk+1 + ξ k , rk ) we have for all w ∈
n
f (xk+1 + ξ k , rk ) + s, w − (xk+1 + ξ k ) ≤ f (w, rk ) + δk .
/ IC and taking w = u + v k+1 + ek⊥ ,
Writing fi (uk+1 + v k+1 + ek + ek⊥ ) = fi (u + v k+1 + ek⊥ ) for all u ∈ EC and i ∈
(u ∈ EC ) we obtain
qk (uk+1 + ek ) + s, u − (uk+1 + ek ) ≤ qk (u) + δk ,
where
qk (u) = rk
X fi (u + v k+1 + ek ) ⊥
θ
rk
i∈IC
for u ∈ EC .
Taking s = −(xk+1 − xk )/λk = −(uk+1 − uk )/λk − (v k+1 − v k )/λk we get that qk satisfies (as u − (uk+1 + ek ) ∈ EC )
k+1
u
− uk
, u − (uk+1 + ek ) ≤ qk (u) + δk
qk (uk+1 + ek ) + −
λk
∀u ∈ EC
which means
uk+1 − uk
∈ −∂δk qk (uk+1 + ek ).
λk
Let hk and Ak be functions defined by
hk = |IC | rk θ
(4.34)
Ak (u) = rk Mθ
(4.35)
·
rk
fi (u + v k+1 + ek⊥ ) i ∈ IC .
rk
As qk = hk ◦ Ak , Theorem 4.19 below implies the existence of αk ≥ 0, such that
uk+1 − uk
∈ −∂δk [αk Ak ] (uk+1 + ek )
λk
(4.36)
Theorem 4.19 ( Hiriart-Urruty and Lemaréchal, see [20] page 125). Let f :
and convex. Assume that the qualification condition f (
n
n
→
be convex and, g be increasing
) ∩ Int(dom g) 6= ∅ holds. Then, for all x such that
f (x) ∈ dom g.
s ∈ ∂δ (g ◦ f )(x) ⇐⇒ ∃δ1 , δ2 ≥ α ≥ 0 s.t. δ1 + δ2 = δ. s ∈ ∂δ1 (αf )(x), α ∈ ∂δ2 g (f (x)) .
24
MIGUEL CARRASCO
Take dk > d(uk+1 + ek , C), dk → 0 and defining ϕk : EC →
(4.37)
ϕk (u) =


Ak (u)

+∞
we may write 4.36 in the form
∪ {+∞} by
if d(u, C) ≤ dk
otherwise,
uk+1 − uk
∈ −∂δk [αk ϕk ] (uk+1 + ek )
λk
(4.38)
We will study the cases
• Case
P
P
αk λk < ∞ and
P
αk λk = ∞.
αk λk < ∞. We will use Theorem 4.12. If ū is a cluster point of uk setting
u = (1 − )ū + x̂ where max fi (x̂) < 0
i∈IC
we have ( by Lemma 4.16, (3)) ϕk (u ) → ϕC (u ) and then by Theorem 4.12 we get uk → ū, which is
contradictory with IC 6= ∅.
P
• Case
αk λk = ∞. We will use Theorem 4.11. In order to prove assertions (1) and (2) we set B := {x ∈
EC | d(x, SC ) ≤ } and define ω := lim inf k inf x∈B
/ {ϕk (x)}. We claim that ω > vC . If it was not the
case, we could find a subsequence xkj ∈
/ B such that limj ϕkj (xkj ) ≤ vC , which implies that, for j large
enough d(xkj , C) ≤ dkj and then xkj is bounded. Taking subsequence if necessary, we obtain that xkj
converges to some x̄. Finally, by Lemma 4.16 (4)
ϕC (x̄) ≤ lim inf ϕkj (xkj ) ≤ vC
j
then x̄ ∈ SC which is a contradiction with the fact that xj ∈
/ B for all j.
Take S = (1 − γ )SC + γ x̂, where x̂ ∈ C is given by 4.16 (1), and γ ∈ [0, 1] is small enough such
that
(4.39)
(1 − γ )vC + γ ϕC (x̂) < w ,
with γ → 0 as → 0.
It is clear that B and S , defined above, satisfies assumption (1) of Theorem 4.11. In order to have
the assumption (2), it is enough to prove ω > lim supk supy∈S ϕk (y). Take kj a subsequence attaining
the upper limit and y k attaining the maximum in (4.39), taking subsequences again if is necessary, we can
assume that y k → ȳ ∈ S . Then, by Lemma 4.16 (3) and the definition of S , we have
lim ϕkj (y k ) → ϕC (ȳ) ≤ (1 − γ )vC + γ ϕC (x̂) < w ,
kj →∞
and the assumption (2) of Theorem 4.11 is satisfied. This implies that d(uk+1 , SC ) → 0 which is a
contradiction with the fact that IC 6= ∅.
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
25
The previous theorem and Remark 2.1 permit us to show the convergence of sequences generated by the extragradient
method and the hybrid projection algorithm coupled with penalty/barrier schemes, without assuming the fast/slow
parametrization hypothesis of the previous section.
Corollary 4.20. Assuming the finite length hypothesis holds (see equation 4.2) then we have that the sequence
P
2
(xk ) generated by the algorithm (2.2)-(2.5) applied to the family (f (·, rk )) satisfies
kξk k < ∞, and also converges
to a point in S(P ).
Remark 4.21. If we take αk = 1 and ϕk (·) = f (·, rk ), in Theorem 4.11 and using the same development of Lemma
2.4, we can show that d(xk , S(P )) → 0 for a sequence (xk ) generated by the algorithm (2.2)-(2.5) assuming only
the hypothesis ξ k → 0.
Considering now the hybrid projection algorithm studied in [3] (based in [30]), that we recall here
(4.40)
Prox step
(4.41)
Error
(4.42)
Projection step
(z k − xk ) + λk g k = η k , for g k ∈ ∂δk f (z k , rk )
q
k
η ≤ σ kz k − xk k2 + kλk g k k2
2
xk+1 = xk − βk g k with βk = g k , xk − z k / g k .
In [3], the authors show the convergence of the algorithm under the fast/slow parametrization schemes and finite
length hypothesis.
It is easy to see that (xk ) satisfies
βk
xk+1 − xk
∈ − ∂δk f (xk+1 + ξ k , rk ),
λk
λk
P k 2
P
ξ < ∞, βk > 0 and
where ξ k = z k − xk + βk g k . In [3] the authors also prove that
βk /λk = ∞. Therefore
we can show the following result:
Corollary 4.22. Assuming the finite length hypothesis holds (see equation 4.2) then we have that the sequence
P
(xk ) generated by the previous algorithm, (4.40)- (4.42), applied to the family (f (·, rk )) satisfies
kξk k2 < ∞, and
also converges to a point in S(P ).
References
[1] Alvarez F., Absolute minimizer in convex programing by exponential penalty, J. Convex Anal. 7, No 1 (2000), pp. 197-202.
[2] Alvarez F., Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators
in Hilbert space, SIAM J. Optim. 14 (2004), pp. 773-782.
[3] Alvarez F., Carrasco, M., Pichard, K., Convergence of a hybrid projection-proximal point algorithm coupled with approximation
methods in convex optimization, Math. Oper. Res. Vol 30, No 4 (2005), pp. 966-984.
[4] Attouch H., Viscosity solutions of minimization problems, SIAM J. Optimization 6 (1996), pp. 769-806.
[5] Attouch H., Cominetti R., A dynamical approach to convex minimization coupling approximation with the steepest descent method,
J. Diff. Equations 128 (1996), pp. 519-540.
[6] Auslender A., Numerical methods for nondiferenciable convex optimization, Mathematical Programing Studies, Vol. 30, pp. 102127, 1987.
26
MIGUEL CARRASCO
[7] Auslender A., Cominetti R., Haddou M., Asymptotic analysis for penalty methods in convex and linear programing, Math. Oper.
Res. 22 (1997), pp. 43-62.
[8] Bahraoui M.A., Lemaire B., Convergence of diagonally stationary sequences in convex optimization, Set-Valued Anal. 2 (1994) pp.
49–61.
[9] Bertsekas D.P., Approximation procedures based on the method of multipliers, J. Optim. Theory Appl. 23 (1977), pp. 487-510.
[10] Burachik, R., Iusem, A., Svaiter, B. F., Enlargement of monotone operators with applications to variational inequalities, Set-Valued
Anal. 5 (1997), pp. 159-180.
[11] Cenzor, Y., Zenios, S. A., Proximal minimization algorithm with D-functions. J. Optim. Theory Appl., 73 (1992), No 3 pp. 451-464.
[12] Chen, G., Teboulle, M., Convergence analisis of proximal-like optimization algorithm using Bregman functions. SIAM J. Optim.
No 3 (1993), pp. 538-543.
[13] Cominetti R., Coupling the proximal point algorithm with approximation methods, J. Optim. Theory Appl. 95 (1997), pp. 581-600.
[14] Cominetti R., Nonlinear averages and convergence of penalty trajectories in convex programing, in Ill-Posed Variational Problems
and Regularization Techniques (Trier, 1998), Lecture Notes in Econom. and Math. Systems 477, Springer, Berlin, 1999, pp. 65-78.
[15] Cominetti R., Courdurier M., Coupling general penalty scheme for convex programing with the steepest descent and the proximal
point algorithm, SIAM J. Optim. Vol 13, No 3 (2003), pp. 745-765.
[16] Eckstein J., Bertsekas D.P., On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone
operators, Math. Program. 55 (1992), pp. 293-318.
[17] Gárciga, O. Rolando and Iusem, Alfredo and Svaiter, B. F.,On the need for hybrid steps in hybrid proximal point methods, Oper.
Res. Lett. Vol 29, No 5 (2001), pp. 217–220.
[18] Gonzaga C.G., Path-following methods for linear programing, SIAM Review 34 (1992), pp. 167-224.
[19] Güler, O., On the convergence of the proximal point algorithm for convex minimization, SIAM J. Control Optim. 29 (1991), pp.
403-419.
[20] Hiriart-Urruty J.-B., Lemarchal C., Convex analysis and minimization algorithms. II. Advanced theory and bundle methods.
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 306. Springer-Verlag, Berlin,
1993.
[21] Humes C., Silva P., Inexact proximal point algorithms and Descent methods in optimization Optimization and Engineering 6,
257–271, 2005.
[22] Iusem, A., Pennanen, T., Svaiter, B. F.,Inexact variants of the proximal point algorithm without monotonicity, SIAM J. Optim.
Vol 13 No 4 (2003), pp. 1080-1097.
[23] Korpelevich, G., The extragradient methods for finding sadle points and other problems. Matecon. 13 (1976), pp. 747-756.
[24] Lemaire B., About the convergence of the proximal method, Advances in Optimization, Proc. Lambrecht 1991, Lecture Notes in
Economics and Mathematical Systems, Springer-Verlag, New York (1992), pp. 39-51.
[25] Lemaire B., On the convergence of some iterative methods for convex minimization, Recent developments in optimization (Dijon,
1994), 252–268, Lecture Notes in Econom. and Math. Systems, 429, Springer, Berlin, 1995.
[26] Martinet B., Régularisation d’inéquations variationnelles par approximations successives, Revue Française d’Informatique et
Recherche Operationnelle, 4 (1970), pp. 154-159.
[27] Martinet B., Algorithmes pour la résolution de problémes d’optimization et de Minimax, Thesis, Universite de Grenoble, 1972.
[28] Opial Z., Weak convergence of the sequence of successive approximations for nonexpansive mappings, Bull. of the American Math.
Society 73, (1967), pp. 591-597.
[29] Rockafellar R.T., Monotone operators and the proximal point algorithm, SIAM J. Control Optim., 14 (1976), pp. 877-898.
[30] Solodov M.V., Svaiter B.F., A hybrid projection-proximal point algorithm, J. Convex Anal. 6 (1999), pp. 59-70.
PROXIMAL POINT ALGORITHM AND APPROXIMATION METHODS
27
[31] Solodov M.V., Svaiter B.F., A hybrid Approximate extragradient-proximal point algorithm using the enlargment of a maximal
monotone operator, Set-Valued Analysis. 7 (1999), pp. 323-345.
[32] Solodov M.V., Svaiter B.F., An inexact hybrid generalized proximal point algorithm and some new results on the theory of Bregman
functions, Math. Oper. Res. Vol 25, No 2 (2000), pp. 214-230.
[33] Torralba D., Convergence epigraphique et changements d’echelle en analyse variationelle et optimization, Thesis, Universite de
Montpellier II, 1996.
Miguel Carrasco, Departamento de Ingenierı́a Matemática, Universidad de Chile, Casilla 170/3, Correo 3, Santiago,
Chile., Département de Mathématiques, Université Montpellier II - Case Courrier 051 - Place Eugène Bataillon - 34095
Montpellier cedex 5, France.
E-mail address: migucarr@dim.uchile.cl
Download