MIT Sloan School of Management

advertisement
MIT Sloan School of Management
Working Paper 4286-03
February 2003
On an Extension of Condition Number
Theory to Non-Conic Convex Optimization
Robert M. Freund and Fernando Ordóñez
© 2003 by Robert M. Freund and Fernando Ordóñez. All rights reserved.
Short sections of text, not to exceed two paragraphs, may be quoted without
explicit permission, provided that full credit including © notice is given to the source.
This paper also can be downloaded without charge from the
Social Science Research Network Electronic Paper Collection:
http://ssrn.com/abstract=382363
On an Extension of Condition Number Theory to
Non-Conic Convex Optimization
Robert M. Freund∗ and Fernando Ordóñez†
February 14, 2003
Abstract
The purpose of this paper is to extend, as much as possible, the modern theory
of condition numbers for conic convex optimization:
z∗ := minx ct x
s.t. Ax − b ∈ CY
x ∈ CX ,
to the more general non-conic format:
(GPd )
z∗ := minx ct x
s.t. Ax − b ∈ CY
x∈P ,
where P is any closed convex set, not necessarily a cone, which we call the groundset. Although any convex problem can be transformed to conic form, such transformations are neither unique nor natural given the natural description of many
problems, thereby diminishing the relevance of data-based condition number theory. Herein we extend the modern theory of condition numbers to the problem
format (GPd ). As a byproduct, we are able to state and prove natural extensions of
many theorems from the conic-based theory of condition numbers to this broader
problem format.
Key words: Condition number, convex optimization, conic optimization, duality, sensitivity analysis, perturbation theory.
∗
MIT Sloan School of Management, 50 Memorial Drive, Cambridge, MA 02142, USA, email: rfreund@mit.edu
†
Industrial and Systems Engineering, University of Southern California, GER-247, Los Angeles, CA
90089-0193, USA, email: fordon@usc.edu
1
1
Introduction
The modern theory of condition numbers for convex optimization problems was developed by Renegar in [16] and [17] for convex optimization problems in the following conic
format:
(CPd )
z∗ := minx ct x
s.t. Ax − b ∈ CY
x ∈ CX ,
(1)
where CX ⊆ X and CY ⊆ Y are closed convex cones, A is a linear operator from the
n-dimensional vector space X to the m-dimensional vector space Y, b ∈ Y, and c ∈ X ∗
(the space of linear functionals on X ). The data d for (CPd ) is defined as d := (A, b, c).
The theory of condition numbers for (CPd ) focuses on three measures – ρP (d), ρD (d),
and C(d), to bound various behavioral and computational quantities pertaining to
(CPd ). The quantity ρP (d) is called the “distance to primal infeasibility” and is the
smallest data perturbation ∆d for which (CPd+∆d ) is infeasible. The quantity ρD (d) is
called the “distance to dual infeasibility” for the conic dual (CDd ) of (CPd ):
(CDd )
z ∗ := maxy bt y
∗
s.t. c − At y ∈ CX
y ∈ CY∗ ,
(2)
and is defined similarly to ρP (d) but using the conic dual problem instead (which conveniently is of the same general conic format as the primal problem). The quantity C(d)
is called the “condition measure” or the “condition number” of the problem instance d
and is a (positively) scale-invariant reciprocal of the smallest data perturbation ∆d that
will render the perturbed data instance either primal or dual infeasible:
C(d) :=
kdk
,
min{ρP (d), ρD (d)}
(3)
for a suitably defined norm k · k on the space of data instances d. A problem is called
“ill-posed” if min{ρP (d), ρD (d)} = 0, equivalently C(d) = ∞. These three condition
measure quantities have been shown in theory to be connected to a wide variety of
bounds on behavioral characteristics of (CPd ) and its dual, including bounds on sizes
of feasible solutions, bounds on sizes of optimal solutions, bounds on optimal objective
values, bounds on the sizes and aspect ratios of inscribed balls in the feasible region,
bounds on the rate of deformation of the feasible region under perturbation, bounds on
changes in optimal objective values under perturbation, and numerical bounds related
to the linear algebra computations of certain algorithms, see [16], [5], [4], [6], [7], [8],
[21], [19], [22], [20], [14], [15]. In the context of interior-point methods for linear and
semidefinite optimization, these same three condition measures have also been shown to
2
be connected to various quantities of interest regarding the central trajectory, see [10]
and [11]. The connection of these condition measures to the complexity of algorithms
has been shown in [6], [7], [17], [2], and [3], and some of the references contained therein.
The conic format (CPd ) covers a very general class of convex problems; indeed any
convex optimization problem can be transformed to an equivalent instance of (CPd ).
However, such transformations are not necessarily unique and are sometimes rather
unnatural given the “natural” description and the natural data for the problem. The
condition number theory developed in the aforementioned literature pertains only to
convex optimization problems in conic form, and the relevance of this theory is diminished to the extent that many practical convex optimization problems are not conveyed
in conic format. Furthermore, the transformation of a problem to conic form can result
in dramatically different condition numbers depending on the choice of transformation,
see the example in Section 2 of [13].
Motivated to overcome these shortcomings, herein we extend the condition number
theory to non-conic convex optimization problems. We consider the more general format
for convex optimization:
(GPd )
z∗ (d) = min ct x
s.t. Ax − b ∈ CY
x∈P ,
(4)
where P is allowed to be any closed convex set, possibly unbounded, and possibly without
interior. For example, P could be the solution set of box constraints of the form l ≤ x ≤ u
where some components of l and/or u might be unbounded, or P might be the solution
of network flow constraints of the form N x = g, x ≥ 0. And of course, P might also be a
closed convex cone. We call P the ground-set and we refer to (GPd ) as the “ground-set
model” (GSM) format.
We present the definition of the condition number for problem instances of the more
general GSM format in Section 2, where we also demonstrate some basic properties. A
number of results from condition number theory are extended to the GSM format in
the subsequent sections of the paper. In Section 3 we prove that a problem instance
with a finite condition number has primal and dual Slater points, which in turn implies
that strong duality holds for the problem instance and its dual. In Section 4 we provide
characterizations of the condition number as the solution to associated optimization
problems. In Section 5 we show that if the condition number of a problem instance
is finite, then there exist primal and dual interior solutions that have good geometric
properties. In Section 6 we show that the rate of deformation of primal and dual feasible
regions and optimal objective function values due to changes in the data are bounded
by functions of the condition number. Section 7 contains concluding remarks.
We now present the notation and general assumptions that we will use throughout
the paper.
3
Notation and General Assumptions. We denote the variable space X by IRn and
the constraint space Y by IRm . Therefore, P ⊆ IRn , CY ⊆ IRm , A is an m by n real
matrix, b ∈ IRm , and c ∈ IRn . The spaces X ∗ and Y ∗ of linear functionals on IRn and
IRm can be identified with IRn and IRm , respectively. For v, w ∈ IRn or IRm , we write v t w
for the standard inner product. We denote by D the vector space of all data instances
d = (A, b, c). A particular data instance is denoted equivalently by d or (A, b, c). We
define the norm for a data instance d by kdk := max{kAk, kbk, kck∗ }, where the norms
kxk and kyk on IRn and IRm are given, kAk denotes the usual operator norm, and
k · k∗ denotes the dual norm associated with the norm k · k on IRn or IRm , respectively.
Let B(v, r) denote the ball centered at v with radius r, using the norm for the space
of variables v. For a convex cone S, let S ∗ denote the (positive) dual cone, namely
S ∗ := {s | st x ≥ 0 for all x ∈ S}. Given a set Q ⊂ IRn , we denote the closure and
relative interior of Q by cl Q and relint Q, respectively. We use the convention that if Q
is the singleton Q = {q}, then relint Q = Q. We adopt the standard conventions 01 = ∞
1
= 0.
and ∞
We also make the following two general assumptions:
Assumption 1 P 6= ∅ and CY 6= ∅.
Assumption 2 Either CY 6= IRm or P is not bounded (or both).
Clearly if either P = ∅ or CY = ∅ problem (GPd ) is infeasible regardless of A, b, and
c. Therefore Assumption 1 avoids settings wherein all problem instances are trivially
inherently infeasible. Assumption 2 is needed to avoid settings where (GPd ) is feasible
for every d = (A, b, c) ∈ D. This will be explained further in Section 2.
2
2.1
Condition Numbers for (GPd) and its Dual
Distance to Primal Infeasibility
We denote the feasible region of (GPd ) by:
Xd := {x ∈ IRn | Ax − b ∈ CY , x ∈ P } .
(5)
Let FP := {d ∈ D | Xd 6= ∅}, i.e., FP is the set of data instances for which (GPd ) has a
feasible solution. Similar to the conic case, the primal distance to infeasibility, denoted
by ρP (d), is defined as:
n
ρP (d) := inf {k∆dk | Xd+∆d = ∅} = inf k∆dk | d + ∆d ∈ FPC
4
o
.
(6)
2.2
The Dual Problem and Distance to Dual Infeasibility
In the case when P is a cone, the conic dual problem (2) is of the same basic format as
the primal problem. However, when P is not a cone, we must first develop a suitable
dual problem, which we do in this subsection. Before doing so we introduce a dual
pair of cones associated with the ground-set P . Define the closed convex cone C by
homogenizing P to one higher dimension:
C := cl {(x, t) ∈ IRn × IR | x ∈ tP, t > 0} ,
(7)
and note that C = {(x, t) ∈ IRn × IR | x ∈ tP, t > 0} ∪ (R × {0}) where R is the recession cone of P , namely
R := {v ∈ IRn | there exists x ∈ P for which x + θv ∈ P for all θ ≥ 0} .
(8)
It is straightforward to show that the (positive) dual cone C ∗ of C is
C ∗ := {(s, u) ∈ IRn × IR | st x + u · t ≥ 0 for all (x, t) ∈ C}
= {(s, u) ∈ IRn × IR | st x + u ≥ 0, for all x ∈ P }
= {(s, u) ∈ IRn × IR | inf x∈P st x + u ≥ 0} .
(9)
The standard Lagrangian dual of (GPd ) can be constructed as:
max inf {ct x + (b − Ax)t y}
y∈CY∗ x∈P
which we re-write as:
max inf {bt y + (c − At y)t x} .
(10)
y∈CY∗ x∈P
With the help of (9) we re-write (10) as:
(GDd )
z ∗ (d) = maxy,u bt y − u
s.t.
(c − At y, u) ∈ C ∗
y ∈ CY∗ .
(11)
We consider the formulation (11) to be the dual problem of (4). The feasible region of
(GDd ) is:
Yd := {(y, u) ∈ IRm × IR | (c − At y, u) ∈ C ∗ , y ∈ CY∗ }.
(12)
Let FD := {d ∈ D | Yd 6= ∅}, i.e., FD is the set of data instances for which (GDd ) has a
feasible solution. The dual distance to infeasibility, denoted by ρD (d), is defined as:
n
ρD (d) := inf {k∆dk | Yd+∆d = ∅} = inf k∆dk | d + ∆d ∈ FDC
5
o
.
(13)
We also present an alternate form of (11), which does not use the auxiliary variable
u, based on the function u(·) defined by
u(s) := − inf st x .
x∈P
(14)
It follows from Theorem 5.5 in [18] that u(·), the support function of the set P , is a
convex function. The epigraph of u(·) is:
epi u(·) := {(s, v) ∈ IRn × IR | v ≥ u(s)} ,
and the projection of the epigraph onto the space of the variables s is the effective
domain of u(·):
effdom u(·) := {s ∈ IRn | u(s) < ∞} .
It then follows from (9) that
C ∗ = epi u(·) ,
and so (GDd ) can alternatively be written as:
z ∗ (d) = maxy bt y − u(c − At y)
s.t. c − At y ∈ effdom u(·)
y ∈ CY∗ .
(15)
Evaluating the inclusion (y, u) ∈ Yd is not necessarily an easy task, as it involves
checking the inclusion (c − At y, u) ∈ C ∗ , and C ∗ is an implicitly defined cone. A very
useful tool for evaluating the inclusion (y, u) ∈ Yd is given in the following proposition,
where recall from (8) that R is the recession cone of P .
Proposition 1 If y satisfies y ∈ CY∗ and c − At y ∈ relintR∗ , then u(c − At y) is finite,
and for all u ≥ u(c − At y) it holds that (y, u) is feasible for (GDd ).
Proof: Note from Proposition 11 of the Appendix that cl effdom u(·) = R∗ and
from Proposition 12 of the Appendix that c − At y ∈ relintR∗ = relint cl effdom u(·) =
relint effdom u(·) ⊆ effdom u(·). This shows that u(c − At y) is finite and (c − At y, u(c −
At y)) ∈ C ∗ . Therefore (y, u) is feasible for (GDd ) for all u ≥ u(c − At y).
2.3
Condition Number
A data instance d = (A, b, c) is consistent if both the primal and dual problems have
feasible solutions. Let F denote the set of consistent data instances, namely F :=
6
FP ∩ FD = {d ∈ D | Xd 6= ∅ and Yd 6= ∅}. For d ∈ F, the distance to infeasibility is
defined as:
ρ(d) := min {ρP (d), ρD (d)}
(16)
= inf {k∆dk | Xd+∆d = ∅ or Yd+∆d = ∅} ,
the interpretation being that ρ(d) is the size of the smallest perturbation of d which will
render the perturbed problem instance either primal or dual infeasible. The condition
number of the instance d is defined as
C(d) :=


 kdk
ρ(d) > 0
ρ(d)


∞
ρ(d) = 0 ,
which is a (positive) scale-invariant reciprocal of the distance to infeasibility. This definition of condition number for convex optimization problems was first introduced by
Renegar for problems in conic form in [16] and [17].
2.4
Basic Properties of ρP (d), ρD (d), and C(d), and Alternative
Duality Results
The need for Assumptions 1 and 2 is demonstrated by the following:
Proposition 2 For any data instance d ∈ D,
1. ρP (d) = ∞ if and only if CY = IRm .
2. ρD (d) = ∞ if and only if P is bounded.
The proof of this proposition relies on Lemmas 1 and 2, which are versions of “theorems of the alternative” for primal and dual feasibility of (GPd ) and (GDd ). These two
lemmas are stated and proved at the end of this section.
Proof of Proposition 2: Clearly CY = IRm implies that ρP (d) = ∞. Also, if P is
bounded, then R = {0} and R∗ = IRn , whereby from Proposition 1 we have that (GDd )
is feasible for any d, and so ρD (d) = ∞. Therefore for both items it only remains to
prove the converse implication. Recall that we denote d = (A, b, c).
Assume that ρP (d) = ∞, and suppose that CY 6= IRm . Then CY∗ 6= {0}, and consider
∆d
a point ỹ ∈ CY∗ , ỹ 6= 0. Define the perturbation
= (∆A, ∆b, ∆c) = (−A, −b + ỹ, −c)
ỹ t ỹ
¯
and d = d + ∆d. Then the point (y, u) = ỹ, 2 satisfies the alternative system (A2d¯)
of Lemma 1 for the data d¯ = (0, ỹ, 0), whereby Xd¯ = ∅. Therefore kd¯− dk ≥ ρP (d) = ∞,
a contradiction, and so CY = IRm .
7
Now assume that ρD (d) = ∞, and suppose that P is not bounded, and so R 6= {0}.
Consider x̃ ∈ R, x̃ 6= 0, and define the perturbation ∆d = (−A, −b, −c − x̃). Then the
point x̃ satisfies the alternative system (B2d¯) of Lemma 2 for the data d¯ = d + ∆d =
(0, 0, −x̃), whereby Yd¯ = ∅. Therefore kd¯ − dk ≥ ρD (d) = ∞, a contradiction, and so P
is bounded.
Remark 1 If d ∈ F, then C(d) ≥ 1.
Proof: Consider the data instance d0 = (0, 0, 0). Note that Xd0 = P 6= ∅ and
Yd0 = CY∗ × IR+ 6= ∅, therefore d0 ∈ F. If CY 6= IRm , consider b ∈ IRm \ CY , b 6= 0, and
for any ε > 0 define the instance dε = (0, −εb, 0). This instance is such that for any
ε > 0, Xdε = ∅, which means that dε ∈ FPC and therefore ρP (d) ≤ inf ε>0 kd − dε k ≤ kdk.
If CY = IRm , then Assumption 2 implies that P is unbounded. This means that there
exists a ray r ∈ R, r 6= 0. For any ε > 0 the instance dε = (0, 0, −εr) is such that
Ydε = ∅, which means that dε ∈ FDC and therefore ρD (d) ≤ inf ε>0 kd − dε k ≤ kdk .
In each case we have ρ(d) = min{ρP (d), ρP (d)} ≤ kdk, which implies the result.
The following two lemmas present weak and strong alternative results for (GPd ) and
(GDd ), and are used in the proofs of Proposition 2 and elsewhere.
Lemma 1 Consider the following systems with data d = (A, b, c):
(Xd )
Ax − b ∈ CY
,
x∈P
(A1d )
(−At y, u) ∈ C ∗
bt y ≥ u
,
y 6= 0
y ∈ CY∗
(A2d )
(−At y, u) ∈ C ∗
bt y > u
y ∈ CY∗
If system (Xd ) is infeasible, then system (A1d ) is feasible. Conversely, if system
(A2d ) is feasible, then system (Xd ) is infeasible.
Proof: Assume that system (Xd ) is infeasible. This implies that
b 6∈ S := {Ax − v | x ∈ P, v ∈ CY } ,
which is a nonempty convex set. Using Proposition 10 we can separate b from S and
therefore there exists y 6= 0 such that
y t (Ax − v) ≤ y t b for all x ∈ P, v ∈ CY .
Set u := y t b, then the inequality implies that y ∈ CY∗ and that (−At y)t x + u ≥ 0 for any
x ∈ P . Therefore (−At y, u) ∈ C ∗ and (y, u) satisfies system (A1d ).
Conversely, if both (A2d ) and (Xd ) are feasible then
0 ≤ y t (Ax − b) = (At y)t x − bt y < − (−At y)t x + u ≤ 0 .
8
Lemma 2 Consider the following systems with data d = (A, b, c):
∗
t
(Yd )
(c − A y, u) ∈ C
,
y ∈ CY∗
(B1d )
Ax ∈ CY
ct x ≤ 0
,
x 6= 0
x∈R
(B2d )
Ax ∈ CY
ct x < 0
x∈R
If system (Yd ) is infeasible, then system (B1d ) is feasible. Conversely, if system (B2d )
is feasible, then system (Yd ) is infeasible.
Proof: Assume that system (Yd ) is infeasible, this implies that
n
(0, 0, 0) 6∈ S := (s, v, q) | ∃ y, u s.t. (c − At y, u) + (s, v) ∈ C ∗ , y + q ∈ CY∗
o
,
which is a nonempty convex set. Using Proposition 10 we separate the point (0, 0, 0) from
S and therefore there exists (x, δ, z) 6= 0 such that xt s + δv + z t q ≥ 0 for all (s, v, q) ∈ S.
For any (y, u), (s̃, ṽ) ∈ C ∗ , and q̃ ∈ CY∗ , define s = −(c − At y) + s̃, v = −u + ṽ, and
q = −y + q̃. By construction (s, v, q) ∈ S and therefore for any y, u, (s̃, ṽ) ∈ C ∗ , q̃ ∈ CY∗
we have
−xt c + (Ax − z)t y + xt s̃ − δu + δṽ + z t q̃ ≥ 0 .
The above inequality implies that δ = 0, Ax = z ∈ CY , x ∈ R, and ct x ≤ 0. In addition
x 6= 0, because otherwise (x, δ, z) = (x, 0, Ax) = 0. Therefore (B1d ) is feasible.
Conversely, if both (B2d ) and (Yd ) are feasible then
0 ≤ xt (c − At y) = ct x − y t Ax < −y t Ax ≤ 0 .
3
Slater Points, Distance to Infeasibility, and Strong
Duality
In this section we prove that the existence of a Slater point in either (GPd ) or (GDd ) is
sufficient to guarantee that strong duality holds for these problems. We then show that
a positive distance to infeasibility implies the existence of Slater points, and use these
results to show that strong duality holds whenever ρP (d) > 0 or ρD (d) > 0. We first
state a weak duality result.
Proposition 3 Weak duality holds between (GPd ) and (GDd ), that is, z ∗ (d) ≤ z∗ (d) .
Proof: Consider x and (y, u) feasible for (GPd ) and (GDd ), respectively. Then
t
0 ≤ c − At y x + u = ct x − y t Ax + u ≤ ct x − bt y + u,
9
where the last inequality follows from y t (Ax − b) ≥ 0. Therefore z∗ (d) ≥ z ∗ (d).
A classic constraint qualification in the history of constrained optimization is the
existence of a Slater point in the feasible region, see for example Theorem 30.4 of [18]
or Chapter 5 of [1]. We now define a Slater point for problems in the GSM format.
Definition 1 A point x is a Slater point for problem (GPd ) if
x ∈ relintP and Ax − b ∈ relintCY .
A point (y, u) is a Slater point for problem (GDd ) if
y ∈ relintCY∗ and (c − At y, u) ∈ relintC ∗ .
We now present the statements of the main results of this section, deferring the
proofs to the end of the section. The following two theorems show that the existence of
a Slater point in the primal or dual is sufficient to guarantee strong duality as well as
attainment in the dual or the primal problem, respectively.
Theorem 1 If x0 is a Slater point for problem (GPd ), then z∗ (d) = z ∗ (d). If in addition
z∗ (d) > −∞, then Yd 6= ∅ and problem (GDd ) attains its optimum.
Theorem 2 If (y 0 , u0 ) is a Slater point for problem (GDd ), then z∗ (d) = z ∗ (d). If in
addition z ∗ (d) < ∞, then Xd =
6 ∅ and problem (GPd ) attains its optimum.
The next three results show that a positive distance to infeasibility is sufficient to
guarantee the existence of Slater point for the primal and the dual problems, respectively,
and hence is sufficient to ensure that strong duality holds. The fact that a positive
distance to infeasibility implies the existence of an interior point in the feasible region
is shown for the conic case in Theorems 15, 17, and 19 in [8] and Theorem 3.1 in [17].
Theorem 3 Suppose that ρP (d) > 0. Then there exists a Slater point for (GPd ).
Theorem 4 Suppose that ρD (d) > 0. Then there exists a Slater point for (GDd ).
Corollary 1 (Strong Duality) If ρP (d) > 0 or ρD (d) > 0, then z∗ (d) = z ∗ (d). If
ρ(d) > 0, then both the primal and the dual attain their respective optimal values.
Proof: The proof of this result is a straightforward consequence of Theorems 1, 2,
3, and 4.
10
Note that the contrapositive of Corollary 1 says that if d ∈ F and z∗ (d) > z ∗ (d),
then ρP (d) = ρD (d) = 0 and so ρ(d) = 0. In other words, if a data instance d is primal
and dual feasible but has a positive optimal duality gap, then d must necessarily be
arbitrarily close to being both primal infeasible and dual infeasible.
Proof of Theorem 1: For simplicity, let z∗ and z ∗ denote the primal and dual
optimal objective values, respectively. The interesting case is when z∗ > −∞, otherwise
weak duality implies that (GDd ) is infeasible and z∗ = z ∗ = −∞. If z∗ > −∞ the point
(0, 0, 0) does not belong to the non-empty convex set
n
o
S := (p, q, α) | ∃x s.t. x + p ∈ P, Ax − b + q ∈ CY , ct x − α < z∗ .
We use Proposition 10 to properly separate (0, 0, 0) from S, which implies that there
exists (γ, y, π) 6= 0 such that γ t p + y t q + πα ≥ 0 for all (p, q, α) ∈ S. Note that π ≥ 0
because α is not upper bounded in the definition of S.
If π > 0, re-scale (γ, y, π) such that π = 1. For any x ∈ IRn , p̃ ∈ P , q̃ ∈ CY , and
ε > 0 define p = −x + p̃, q = −Ax + b + q̃, and α = ct x − z∗ + ε. By construction the
point (p, q, α) ∈ S and the proper separation implies that for all x, p̃ ∈ P , q̃ ∈ CY , and
ε>0
0 ≤ γ t (−x + p̃) + y t (−Ax + b + q̃) + ct x − z∗ + ε
= (−At y + c − γ)t x + γ t p̃ + y t q̃ + y t b − z∗ + ε .
This expression implies that c − At y = γ, y ∈ CY∗ , and (c − At y, u) ∈ C ∗ for u :=
y t b−z∗ . Therefore (y, u) is feasible for (GDd ) and z ∗ ≥ bt y −u = bt y −y t b+z∗ = z∗ ≥ z ∗ ,
which implies that z ∗ = z∗ and the dual feasible point (y, u) attains the dual optimum.
If π = 0, the same construction used above and proper separation gives the following
inequality for all x, p̃ ∈ P , and q̃ ∈ CY
0 ≤ γ t (−x + p̃) + y t (−Ax + b + q̃)
= (−At y − γ)t x + γ t p̃ + y t q̃ + y t b .
This implies that −At y = γ and y ∈ CY∗ , which implies that −y t Ap̃ + y t q̃ + y t b ≥ 0 for
any p̃ ∈ P , q̃ ∈ CY . Proper separation also guarantees that there exists (p̂, q̂, α̂) ∈ S
such that γ t p̂ + y t q̂ + π α̂ = −y t Ap̂ + y t q̂ > 0.
Let x0 be the Slater point of (GPd ) and x̂ such that x̂ + p̂ ∈ P , Ax̂ − b + q̂ ∈ CY , and
c x̂ − α̂ < z∗ For all |ξ| sufficiently small, x0 + ξ(x̂ + p̂ − x0 ) ∈ P and Ax0 − b + ξ(Ax̂ −
b + q̂ − (Ax0 − b)) ∈ CY . Therefore
t
0 ≤ −y t A (x0 + ξ(x̂ + p̂ − x0 )) + y t (Ax0 − b + ξ(Ax̂ − b + q̂ − (Ax0 − b))) + y t b
= ξ −y t Ax̂ − y t Ap̂ + y t Ax0 + y t Ax̂ − y t b + y t q̂ − y t Ax0 + y t b
= ξ −y t Ap̂ + y t q̂
,
11
a contradiction, since ξ can be negative and −y t Ap̂ + y t q̂ > 0. Therefore π 6= 0,
completing the proof.
Proof of Theorem 2: For simplicity, let z∗ and z ∗ denote the primal and dual
optimal objective values respectively. The interesting case is when z ∗ < ∞, otherwise
weak duality implies that (GPd ) is infeasible and z∗ = z ∗ = ∞. If z ∗ < ∞ the point
(0, 0, 0, 0) does not belong to the non-empty convex set
n
S := (s, v, q, α) | ∃y, u s.t. (c − At y, u) + (s, v) ∈ C ∗ , y + q ∈ CY∗ , bt y − u + α > z ∗
o
.
We use Proposition 10 to properly separate (0, 0, 0, 0) from S, which implies that there
exists (x, β, γ, δ) 6= 0 such that xt s + βv + γ t q + δα ≥ 0 for all (s, v, q, α) ∈ S. Note that
δ ≥ 0 because α is not upper bounded in the definition of S.
If δ > 0, re-scale (x, β, γ, δ) such that δ = 1. For any y ∈ IRm , u ∈ IR, (s̃, ṽ) ∈ C ∗ ,
q̃ ∈ CY∗ , and ε > 0, define s = −c+At y+s̃, v = −u+ṽ, q = −y+q̃, and α = z ∗ −bt y+u+ε.
By construction the point (s, v, q, α) ∈ S and proper separation implies that for all y, u,
(s̃, ṽ) ∈ C ∗ , q̃ ∈ CY∗ , and ε > 0
0 ≤ xt (−c + At y + s̃) + β(−u + ṽ) + γ t (−y + q̃) + z ∗ − bt y + u + ε
= (Ax − b − γ)t y + (x, β)t (s̃, ṽ) + (1 − β)u + γ t q̃ − ct x + z ∗ + ε .
This implies that Ax − b = γ ∈ CY , β = 1, ct x ≤ z ∗ , and (x, 1) ∈ C, which means that
x ∈ P . Therefore x is feasible for (GPd ) and z ∗ ≥ ct x ≥ z∗ ≥ z ∗ , which implies that
z ∗ = z∗ and the primal feasible point x attains the optimum.
If δ = 0, the same construction used above and proper separation gives the following
inequality for all y, u, (s̃, ṽ) ∈ C ∗ , q̃ ∈ CY∗
0 ≤ xt (−c + At y + s̃) + β(−u + ṽ) + γ t (−y + q̃)
= (Ax − γ)t y + (x, β)t (s̃, ṽ) − βu + γ t q̃ − ct x .
This implies that Ax = γ ∈ CY , β = 0, which means that xt s̃ + xt At q̃ − ct x ≥ 0 for
any (s̃, ũ) ∈ C ∗ and q̃ ∈ CY∗ . The proper separation also guarantees that there exists
(ŝ, v̂, q̂, α̂) ∈ S such that xt ŝ + βv̂ + γ t q̂ = xt ŝ + xt At q̂ > 0.
Let (y 0 , u0 ) be the Slater point of (GDd ) and (ŷ, û) such that (c − At ŷ + ŝ, û + v̂) ∈ C ∗ ,
ŷ + q̂ ∈ CY∗ , and bt ŷ − û + α̂ > z ∗ . Then for all |ξ| sufficiently small, we have that
y 0 + ξ (ŷ + q̂ − y 0 ) ∈ CY∗ and
c − At y 0 + ξ c − At ŷ + ŝ − c + At y 0 , u0 + ξ (û + v̂ − u0 ) ∈ C ∗ .
Therefore
xt c − At y 0 + ξ c − At ŷ + ŝ − c + At y 0
12
+ xt At (y 0 + ξ (ŷ + q̂ − y 0 )) − ct x ≥ 0 .
Simplifying and canceling, we obtain
0 ≤ ξ −xt At ŷ + xt ŝ + xt At y 0 + xt At ŷ + xt At q̂ − xt At y 0
= ξ xt ŝ + xt At q̂
,
a contradiction, since ξ can be negative and xt ŝ+xt At q̂ > 0. Therefore δ 6= 0, completing
the proof.
Proof of Theorem 3: Equation (6) and ρP (d) > 0 imply that Xd 6= ∅. Assume that Xd
contains no Slater point, then relintCY ∩{Ax − b | x ∈ relintP } = ∅ and these nonempty
convex sets can be separated using Proposition 10. Therefore there exists y 6= 0 such
that for any s ∈ CY , x ∈ P we have
y t s ≥ y t (Ax − b) .
Let u = y t b; from the inequality above we have that y ∈ CY∗ and −y t Ax + u ≥ 0 for
ε
any x ∈ P , which implies that (−At y, u) ∈ C ∗ . Define bε = b + kyk
ŷ, with ŷ given by
∗
t
Proposition 9 such that kŷk = 1 and ŷ y = kyk∗ . Then the point (y, u) is feasible for
Problem (A2dε ) of Lemma 1 with data dε = (A, bε , c) for any ε > 0. This implies that
ε
= 0, a contradiction.
Xdε = ∅ and therefore ρP (d) ≤ inf ε>0 kd − dε k = inf ε>0 kyk
∗
Proof of Theorem 4: Equation (13) and ρD (d) > 0 imply that Yd 6= ∅. Assume that
Yd contains no Slater point. Consider the nonempty convex set S defined by:
S :=
n
c − At y, u | y ∈ relintCY∗ , u ∈ IR
o
.
No Slater point in the dual implies that relintC ∗ ∩ S = ∅. Therefore we can properly
separate these two nonempty convex sets using Proposition 10, whereby there exists
(x, t) 6= 0 such that for any (s, v) ∈ C ∗ , y ∈ CY∗ , u ∈ IR we have
xt s + tv ≥ xt c − At y + tu .
The above inequality implies that Ax ∈ CY , ct x ≤ 0, (x, t) ∈ C, and t = 0. This last
fact implies that x 6= 0 and x ∈ R. Let x̂ be such that kx̂k∗ = 1 and x̂t x = kxk (see
ε
Proposition 9). For any ε > 0, define cε = c − kxk
x̂. Then the point x is feasible for
Problem (B2dε ) of Lemma 2 with data dε = (A, b, cε ). This implies then that Ydε = ∅
ε
and consequently ρD (d) ≤ inf ε>0 kd − dε k = inf ε>0 kxk
= 0, a contradiction.
The contrapositives of Theorems 3 and 4 are not true. Consider for example the data
"
A=
0 0
0 0
#
, b=
−1
0
!
, and c =
1
0
!
,
and the sets CY = IR+ × {0} and P = CX = IR+ × IR. Problem (GPd ) for this example
has a Slater point at (1, 0), and ρP (d) = 0 (perturbing by ∆b = (0, ε) makes the problem
infeasible for any ε.) Problem (GDd ) for the same example has a Slater point at (1, 0)
and ρD (d) = 0 (perturbing by ∆c = (0, ε) makes the problem infeasible for any ε).
13
4
Characterization of ρP (d) and ρD (d) via Associated
Optimization Problems
Equation (16) shows that to characterize ρ(d) for consistent data instances d ∈ F, it
is sufficient to express ρP (d) and ρD (d) in a convenient form. Below we show that
these distances to infeasibility can be obtained as the solutions of certain associated
optimization problems. These results can be viewed as an extension to problems not in
conic form of Theorem 3.5 of [17], and Theorems 1 and 2 of [8].
Theorem 5 Suppose that Xd 6= ∅. Then ρP (d) = jP (d) = rP (d), where
jP (d) = min max {kAt y + sk∗ , |bt y − u|}
kyk∗ = 1
y ∈ CY∗
(s, u) ∈ C ∗
(17)
and
rP (d) =
min
max θ
kvk ≤ 1 Ax − bt − vθ ∈ CY
v ∈ IRm kxk + |t| ≤ 1
(x, t) ∈ C .
(18)
Theorem 6 Suppose that Yd 6= ∅. Then ρD (d) = jD (d) = rD (d), where
jD (d) = min max {kAx − pk, |ct x + g|}
kxk = 1
x∈R
p ∈ CY
g≥0
(19)
and
rD (d) =
min
max θ
kvk∗ ≤ 1 −At y + cδ − θv ∈ R∗
v ∈ IRn kyk∗ + |δ| ≤ 1
y ∈ CY∗
δ≥0.
(20)
Proof of Theorem 5: Assume that jP (d) > ρP (d), then there exists a data instance
d¯ = (Ā, b̄, c̄) that is primal infeasible and kA − Āk < jP (d), kb − b̄k < jP (d), and
14
kc − c̄k∗ < jP (d). From Lemma 1 there is a point (ȳ, ū) that satisfies the following:
(−Āt ȳ, ū) ∈ C ∗
b̄t ȳ ≥ ū
ȳ 6= 0
ȳ ∈ CY∗ .
Scale ȳ such that kȳk∗ = 1, then (y, s, u) = (ȳ, −Āt ȳ, b̄t ȳ) is feasible for (17) and
kAt y + sk∗ = kAt ȳ − Āt ȳk∗ ≤ kA − Ākkȳk∗ < jP (d)
|bt y − u| = |bt ȳ − b̄t ȳ| ≤ kb − b̄kkȳk∗ < jP (d).
In the first inequality above we used the fact that kAt k∗ = kAk. Therefore jP (d) ≤
max {kAt y + sk∗ , |bt y − u|} < jP (d), a contradiction.
Let us now assume that jP (d) < γ < ρP (d) for some γ. This means that there exists
(ȳ, s̄, ū) such that ȳ ∈ CY∗ , kȳk∗ = 1, (s̄, ū) ∈ C ∗ , and that
kAt ȳ + s̄k∗ < γ,
|bt ȳ − ū| < γ.
From Proposition 9, consider ŷ such that kŷk = 1 and ŷ t ȳ = kȳk∗ = 1, and define, for
ε > 0,
Ā = A − ŷ ((At ȳ)t + s̄t )
b̄ε = b − ŷ (bt ȳ − ū − ε) .
We have that ȳ ∈ CY∗ , −Āt ȳ = s̄, b̄tε ȳ = ū + ε > ū, and (−Āt ȳ, ū) ∈ C ∗ . This implies
that for any ε > 0, the problem (A2d¯ε ) in Lemma 1 is feasible with data d¯ε = (Ā, b̄ε , c).
Lemma 1 then implies that Xd¯ε = ∅ and therefore ρP (d) ≤ kd − d¯ε k. To finish the proof
we compute the size of the perturbation:
kA − Āk = kŷ (At ȳ)t + s̄t k ≤ kAt ȳ + s̄k∗ kŷk < γ
kb − b̄ε k = |bt ȳ − ū − ε|kŷk ≤ |bt ȳ − ū| + ε < γ + ε,
n
o
which implies, ρP (d) ≤ kd − d¯ε k = max kA − Āk, kb − b̄ε k < γ + ε < ρP (d), for ε small
enough. This is a contradiction, whereby jP (d) = ρP (d).
To prove the other characterization, we note that θ ≥ 0 in Problem (18) and invoke
Lemma 6 to rewrite it as
rP (d) =
min
min max {kAt y + sk∗ , | − bt y + u|}
kvk ≤ 1 y t v ≥ 1
v ∈ IRm y ∈ CY∗
(s, u) ∈ C ∗ .
15
The above problem can be written as the following equivalent optimization problem:
rP (d) = min max {kAt y + sk∗ , | − bt y + u|}
kyk∗ ≥ 1
y ∈ CY∗
(s, u) ∈ C ∗ .
The equivalence of these problems is verified by combining the minimization operations
in the first problem and using the Cauchy-Schwartz inequality. The converse makes use
of Proposition 9. To finish the proof, we note that if (y, s, u) is optimal for this last
problem then it also satisfies kyk∗ = 1, whereby making it equivalent to (17). Therefore
rP (d) = min max {kAt y + sk∗ , | − bt y + u|} = jP (d) .
kyk∗ = 1
y ∈ CY∗
(s, u) ∈ C ∗
Proof of Theorem 6: Assume that jD (d) > ρD (d), then there exists a data instance
d¯ = (Ā, b̄, c̄) that is dual infeasible and kA − Āk < jD (d), kb − b̄k < jD (d), and kc − c̄k∗ <
jD (d). From Lemma 2 there exists x̄ ∈ R such that x̄ 6= 0, Āx̄ ∈ CY , and c̄t x̄ ≤ 0. We
can scale x̄ such that kx̄k = 1. Then (x, p, g) = (x̄, Āx̄, −c̄t x̄) is feasible for (19), and
kAx − pk = kAx − Āxk ≤ kA − Ākkxk < jD (d)
|ct x + g| = |ct x − c̄t x| ≤ kc − c̄k∗ kxk < jD (d) .
Therefore, jD (d) ≤ max {kAx − pk, |ct x + g|} < jD (d), which is a contradiction.
Assume now that jD (d) < γ < ρD (d) for some γ. Then there exists (x̄, p̄, ḡ) such
that x̄ ∈ R, kx̄k = 1, p̄ ∈ CY , and ḡ ≥ 0, and that kAx̄ − p̄k ≤ γ and |ct x̄ + ḡ| ≤ γ.
From Proposition 9, consider x̂ such that kx̂k∗ = 1 and x̂t x̄ = kx̄k = 1, and define:
Ā = A − (Ax̄ − p̄) x̂t and c̄ε = c − x̂(ct x̄ + ḡ + ε), for ε > 0. By construction Āx̄ = p̄ ∈ CY
and c̄tε x̄ = −ḡ − ε < 0, for any ε > 0. Therefore Problem (B2d¯ε ) in Lemma 2 is feasible
for data d¯ε = (Ā, b, c̄ε ), which implies that Yd¯ε = ∅. We can then bound ρD (d) as follows:
n
ρD (d) ≤ kd − d¯ε k = max k (Ax̄ − p̄) x̂t k, kx̂(ct x̄ + ḡ + ε)k∗
o
≤ max {γ, γ + ε} = γ + ε < ρD (d)
for ε small enough, which is a contradiction. Therefore ρD (d) = jD (d).
To prove the other characterization, we note that θ ≥ 0 in Problem (20) and invoke
Lemma 6 to rewrite it as
rD (d) =
min
min max {k − Ax + pk, |ct x + g|}
kvk∗ ≤ 1 xt v ≥ 1
v ∈ IRn x ∈ R
p ∈ CY
g≥0.
16
The above problem can be written as the following equivalent optimization problem:
rD (d) = min max {k − Ax + pk, |ct x + g|}
kxk ≥ 1
x∈R
p ∈ CY
g≥0.
The equivalence of these problems is verified by combining the minimization operations
in the first problem and using the Cauchy-Schwartz inequality. The converse makes use
of Proposition 9. To finish the proof, we note that if (x, p, g) is optimal for this last
problem then it also satisfies kxk = 1, whereby making it equivalent to (19). Therefore
rD (d) = min max {k − Ax + pk, |ct x + g|} = jD (d) .
kxk = 1
x∈R
p ∈ CY
g≥0
5
Geometric Properties of the Primal and Dual Feasible Regions
In Section 3 we showed that a positive primal and/or dual distance to infeasibility implies
the existence of a primal and/or dual Slater point, respectively. We now show that a
positive distance to infeasibility also implies that the corresponding feasible region has a
reliable solution. We consider a solution in the relative interior of the feasible region to
be a reliable solution if it has good geometric properties: it is not too far from a given
reference point, its distance to the relative boundary of the feasible region is not too
small, and the ratio of these two quantities is not too large, where these quantities are
bounded by appropriate condition numbers.
5.1
Distance to Relative Boundary, Minimum Width of Cone
An affine set T is the translation of a vector subspace L, i.e., T = a + L for some a.
The minimal affine set that contains a given set S is known as the affine hull of S. We
denote the affine hull of S by LS ; it is characterized as:
LS =
(
X
)
αi xi | αi ∈ IR, xi ∈ S,
i∈I
X
i∈I
17
αi = 1, I a finite set
,
b the vector subspace obtained when the affine
see Section 1 in [18]. We denote by L
S
b = L − x. Note that
hull LS is translated to contain the origin; i.e. for any x ∈ S, L
S
S
if 0 ∈ S then LS is a subspace.
Many results in this section involve the distance of a point x ∈ S to the relative
boundary of the set S, denoted by dist(x, rel∂S), defined as follows:
Definition 2 Given a non-empty set S and a point x ∈ S, the distance from x to the
relative boundary of S is
dist(x, rel∂S) := inf x̄ kx − x̄k
s.t. x̄ ∈ LS \ S .
(21)
Note that if S is an affine set (and in particular if S is the singleton S = {s}), then
dist(x, rel∂S) = ∞ for each x ∈ S.
We use the following definition of the min-width of a convex cone:
Definition 3 For a convex cone K, the min-width of K is defined by
)
(
dist(y, rel∂K)
| y ∈ K, y 6= 0
τK := sup
kyk
,
for K 6= {0}, and τK := ∞ if K = {0}.
The measure τK maximizes the ratio of the radius of a ball contained in the relative
interior of K and the norm of its center, and so it intuitively corresponds to half of the
vertex angle of the widest cylindrical cone contained in K. The quantity τK was called
the “inner measure” of K for Euclidean norms in Goffin [9], and has been used more
recently for general norms in analyzing condition measures for conic convex optimization,
see [6]. Note that if K is not a subspace, then τK ∈ (0, 1], and τK is attained for some
y 0 ∈ relintK satisfying ky 0 k = 1, as well as along the ray αy 0 for all α > 0; and τK takes
on larger values to the extent that K has larger minimum width. If K is a subspace,
then τK = ∞.
5.2
Geometric Properties of the Feasible Region of GPd
In this subsection we present results concerning geometric properties of the feasible
region Xd of (GPd ). We defer all proofs to the end of the subsection.
The following proposition is an extension of Lemma 3.2 of [16] to the ground-set
model format.
18
Proposition 4 Consider any x = x̂ + r feasible for (GPd ) such that x̂ ∈ P and r ∈ R.
If ρD (d) > 0 then
n
o
1
max kAx̂ − bk, ct r .
krk ≤
ρD (d)
The following result is an extension of Assertion 1 of Theorem 1.1 of [16] to the
ground-set model format of (GPd ):
Proposition 5 Consider any x0 ∈ P . If ρP (d) > 0 then there exists x̄ ∈ Xd satisfying
kx̄ − x0 k ≤
n
o
dist(Ax0 − b, CY )
max 1, kx0 k .
ρP (d)
The following is the main result of this subsection, and can be viewed as an extension
of Theorems 15, 17, and 19 of [8] to the ground-set model format of (GPd ). In Theorem
7 we assume for expository convenience that P is not an affine set and CY is not a
subspace. These assumptions are relaxed in Theorem 8.
Theorem 7 Suppose that P is not an affine set, CY is not a subspace, and consider
any x0 ∈ P . If ρP (d) > 0 then there exists x̄ ∈ Xd satisfying:
1. (a) kx̄ − x0 k ≤
kAx0 − bk + kAk
max{1, kx0 k}
ρP (d)
(b) kx̄k ≤ kx0 k +
kAx0 − bk + kAk
ρP (d)
1
1
kAx0 − bk + kAk
2. (a)
≤
1
+
dist(x̄, rel∂P )
dist(x0 , rel∂P )
ρP (d)
!
1
1
kAx0 − bk + kAk
(b)
≤
1
+
dist(x̄, rel∂Xd )
min {dist(x0 , rel∂P ), τCY }
ρP (d)
kx̄ − x0 k
1
≤
3. (a)
0
dist(x̄, rel∂P )
dist(x , rel∂P )
!
kAx0 − bk + kAk
max{1, kx0 k}
ρP (d)
!
kx̄ − x0 k
1
(b)
≤
0
dist(x̄, rel∂Xd )
min {dist(x , rel∂P ), τCY }
kAx0 − bk + kAk
max{1, kx0 k}
ρP (d)
kx̄k
1
kAx0 − bk + kAk
0
(c)
≤
kx
k
+
dist(x̄, rel∂P )
dist(x0 , rel∂P )
ρP (d)
19
!
!
1
kAx0 − bk + kAk
kx̄k
0
≤
kx
k
+
(d)
dist(x̄, rel∂Xd )
min {dist(x0 , rel∂P ), τCY }
ρP (d)
!
The statement of Theorem 8 below relaxes the assumptions on P and CY not being
affine and/or linear spaces:
Theorem 8 Consider any x0 ∈ P . If ρP (d) > 0 then there exists x̄ ∈ Xd with the
following properties:
• If P is not an affine set, x̄ satisfies all items of Theorem 7.
• If P is an affine set and CY is not a subspace, x̄ satisfies all items of Theorem
7, where items 2.(a), 3.(a), and 3.(c) are vacuously valid as both sides of these
inequalities are zero.
• If P is an affine set and CY is a subspace, x̄ satisfies all items of Theorem 7, where
items 2.(a), 2.(b), 3.(a), 3.(b), 3.(c), and 3.(d) are vacuously valid as both sides of
these inequalities are zero.
We conclude this subsection by presenting a result which captures the thrust of
Theorems 7 and 8, emphasizing how the distance to infeasibility ρP (d) and the geometric
properties of a given point x0 ∈ P bound various geometric properties of the feasible
region Xd . For x0 ∈ P , define the following measure:
gP,CY (x0 ) :=
max{kx0 k, 1}
.
min{1, dist(x0 , rel∂P ), τCY }
Also define the following geometric measure of the feasible region Xd :
(
gXd
1
kxk
,
:= min max kxk,
x∈Xd
dist(x, rel∂Xd ) dist(x, rel∂Xd )
The following is an immediate consequence of Theorems 7 and 8.
Corollary 2 Consider any x0 ∈ P . If ρP (d) > 0 then
gXd
kAx0 − bk + kAk
≤ gP,CY (x ) 1 +
ρP (d)
0
20
!
.
)
.
We now proceed with proofs of these results.
Proof of Proposition 4: If r = 0 the result is true. If r 6= 0, then Proposition 9 shows
that there exists r̂ such that kr̂k∗ = 1 and r̂t r = krk. For any ε > 0 define the following
perturbed problem instance:
Ā = A +
1
(Ax̂ − b) r̂t ,
krk
b̄ = b,
c̄ = c +
−(ct r)+ − ε
r̂ .
krk
Note that, for the data d¯ = (Ā, b̄, c̄), the point r satisfies (B2d¯) in Lemma 2, and therefore
¯ which implies
(GDd¯) is infeasible. We conclude that ρD (d) ≤ kd − dk,
ρD (d) ≤
max {kAx̂ − bk, (ct r)+ + ε}
krk
ρD (d) ≤
max {kAx̂ − bk, ct r}
.
krk
and so
The following technical lemma, which concerns the optimization problem (P P ) below, is used in the subsequent proofs. Problem (P P ) is parametrized by given points
x0 ∈ P and w0 ∈ CY , and is defined by
(P P ) maxx,t,w,θ θ
s.t.
Ax − bt − w = θ (b − Ax0 + w0 )
kxk + |t| ≤ 1
(x, t) ∈ C
w ∈ CY .
(22)
Lemma 3 Consider any x0 ∈ P and w0 ∈ CY such that Ax0 − w0 6= b. If ρP (d) > 0,
then there exists a point (x, t, w, θ) feasible for problem (P P ) that satisfies
θ≥
ρP (d)
>0.
kb − Ax0 + w0 k
(23)
Proof: Note that problem (P P ) is feasible for any x0 and w0 since (x, t, w, θ) =
(0, 0, 0, 0) is always feasible, therefore it can either be unbounded or have a finite optimal
objective value. If (P P ) is unbounded, we can find feasible points with an objective
function large enough such that (23) holds. If (P P ) has a finite optimal value, say
θ∗ , then it follows from elementary arguments that it attains its optimal value. Since
ρP (d) > 0 implies Xd 6= ∅, Theorem 5 implies that the optimal solution (x∗ , t∗ , w∗ , θ∗ )
for (P P ) satisfies (23).
Proof of Proposition 5: Assume Ax0 − b 6∈ CY , otherwise x̄ = x0 satisfies the
proposition. We consider problem (P P ), defined by (22), with x0 and w0 ∈ CY such
21
that kAx0 − b − w0 k = dist(Ax0 − b, CY ). From Lemma 3 we have that there exists a
point (x, t, w, θ) feasible for (P P ) that satisfies
θ≥
ρP (d)
ρP (d)
=
.
kb − Ax0 + w0 k
dist(Ax0 − b, CY )
Define
x + θx0
w + θw0
and w̄ =
.
t+θ
t+θ
By construction we have x̄ ∈ P , Ax̄ − b = w̄ ∈ CY , therefore x̄ ∈ Xd , and
x̄ =
kx̄ − x0 k =
n
o
kx − tx0 k
(kxk + t) max{1, kx0 k}
dist(Ax0 − b, CY )
≤
≤
max 1, kx0 k .
t+θ
θ
ρP (d)
Proof of Theorem 7: Note that ρP (d) > 0 implies Xd 6= ∅; note also that ρP (d) is finite,
otherwise Proposition 2 shows that CY = IRm which is a subspace. Set w0 ∈ CY such
0
Y)
. We also assume that Ax0 − b 6= w0 , otherwise
that kw0 k = kAk and τCY = dist(wkw,rel∂C
0k
we can show that x̄ = x0 satisfies the theorem. Let rw0 = dist(w0 , rel∂CY ) = kAkτCY
and let also rx0 = dist(x0 , rel∂P ). We invoke Lemma 3 with x0 and w0 above to obtain
a point (x, t, w, θ), feasible for (P P ) and that from inequality (23) satisfies
0<
1
kAx0 − bk + kAk
≤
.
θ
ρP (d)
(24)
Define the following:
x̄ =
x + θx0
,
t+θ
w̄ =
w + θw0
,
t+θ
rx̄ =
θrx0
,
t+θ
rw̄ =
θτCY
.
t+θ
By construction dist(x̄, rel∂P ) ≥ rx̄ , dist(w̄, rel∂CY ) ≥ rw̄ kAk, and Ax̄−b = w̄ ∈ CY .
Therefore the point x̄ ∈ Xd . We now bound its distance to the relative boundary of the
feasible region.
b ∩ {y|Ay ∈ L } such that kvk ≤ 1, then
Consider any v ∈ L
P
CY
x̄ + αv ∈ P, for any |α| ≤ rx̄ ,
and
A(x̄ + αv) − b = w̄ + α(Av) ∈ CY , for any |α| ≤ rw̄ .
Therefore (x̄ + αv) ∈ Xd for any |α| ≤ min {rx̄ , rw̄ }, and the distance to the relative
boundary of Xd is then dist(x̄, rel∂Xd ) ≥ |α|kvk ≥ |α|, for any |α| ≤ min {rx̄ , rw̄ }.
θ min{rx0 ,τCY }
Therefore dist(x̄, rel∂Xd ) ≥ min {rx̄ , rw̄ } ≥
.
t+θ
To finish the proof, we just have to bound the different expressions from the statement
of the theorem; here we make use of inequality (24):
22
kx − tx0 k
1
kAx0 − bk + kAk
≤ max{1, kx0 k} ≤
max{1, kx0 k} .
t+θ
θ
ρP (d)
1
1
kAx0 − bk + kAk
(b) kx̄k ≤ kxk + kx0 k ≤ + kx0 k ≤ kx0 k +
.
θ
θ
ρP (d)
1. (a) kx̄ − x0 k =
1
1
t+θ
1
1
1
kAx0 − bk + kAk
2. (a)
≤
=
≤
1+
≤
1+
dist(x̄, rel∂P )
rx̄
θrx0
r x0
θ
r x0
ρP (d)
1
1
t+θ
1
1
(b)
≤
≤
1+
dist(x̄, rel∂Xd )
min{rx0 , τCY } θ
min{rx0 , τCY }
θ
!
0
1
kAx − bk + kAk
1+
.
≤
min{rx0 , τCY }
ρP (d)
3. (a)
!
.
kx̄ − x0 k
kx − tx0 k
1 1
≤
≤
max{1, kx0 k}
dist(x̄, rel∂P )
θrx0
r x0 θ
1 kAx0 − bk + kAk
≤
max{1, kx0 k} .
r x0
ρP (d)
kx̄ − x0 k
kx − tx0 k
1
1
≤
≤
max{1, kx0 k}
dist(x̄, rel∂Xd )
θ min{rx0 , τCY }
min{rx0 , τCY } θ
1
kAx0 − bk + kAk
≤
max{1, kx0 k} .
min{rx0 , τCY }
ρP (d)
kx̄k
kx + θx0 k
1
1
0
(c)
≤
≤
kx k +
dist(x̄, rel∂P )
θrx0
r x0
θ
!
0
1
kAx − bk + kAk
0
≤
kx k +
.
r x0
ρP (d)
(b)
kx̄k
kx + θx0 k
1
1
(d)
≤
≤
kx0 k +
dist(x̄, rel∂Xd )
θ min{rx0 , τCY }
min{rx0 , τCY }
θ
!
0
1
kAx − bk + kAk
kx0 k +
.
≤
min{rx0 , τCY }
ρP (d)
Finally, we note that Theorem 8 can be proved using almost identical arguments as
in the proof of Theorem 7, but with a careful analysis to handle the special cases when
P is an affine set or CY is a subspace, see [12] for exact details.
5.3
Solutions in the relative interior of Yd
In this subsection we present results concerning geometric properties of the dual feasible
region Yd of (GDd ). We defer all proofs to the end of the subsection. Before proceeding,
23
we first discuss norms that arise when studying the dual problem. Motivated quite
naturally by (18), we define the norm k(x, t)k := kxk+|t| for points (x, t) ∈ C ⊂ IRn ×IR.
This then leads to the following dual norm for points (s, u) ∈ C ∗ ⊂ IRn × IR:
k(s, u)k∗ := max{ksk∗ , |u|} .
(25)
Consistent with the characterization of ρD (d) given by (20) in Theorem 6, we define
the following dual norm for points (y, δ) ∈ IRm × IR:
k(y, δ)k∗ := kyk∗ + |δ| .
(26)
It is clear that the above defines a norm on the vector space IRm × IR which contains Yd .
The following proposition bounds the norm of the y component of the dual feasible
solution (y, u) in terms of the objective function value bt y − u; it corresponds to Lemma
3.1 of [16] for the ground-set model format.
Proposition 6 Consider any (y, u) feasible for (GDd ). If ρP (d) > 0 then
kyk∗ ≤
max {kck∗ , −(bt y − u)}
.
ρP (d)
The following result corresponds to Assertion 1 of Theorem 1.1 of [16] for the groundset model format dual problem (GDd ):
Proposition 7 Consider any y 0 ∈ CY∗ . If ρD (d) > 0 then for any ε > 0, there exists
(ȳ, ū) ∈ Yd satisfying
kȳ − y 0 k ≤
n
o
dist(c − At y 0 , R∗ ) + ε
max 1, ky 0 k .
ρD (d)
The following is the main result of this subsection, and can be viewed as an extension
of Theorems 15, 17, and 19 of [8] to the dual problem (GDd ). In Theorem 9 we assume
for expository convenience that CY is not a subspace and that R (the recession cone of
P ) is not a subspace. These assumptions are relaxed in Theorem 10.
Theorem 9 Suppose that R and CY are not subspaces and consider any y 0 ∈ CY∗ . If
ρD (d) > 0 then for any ε > 0, there exists (ȳ, ū) ∈ Yd satisfying:
1. (a) kȳ − y 0 k∗ ≤
kc − At y 0 k∗ + kAk
max{1, ky 0 k∗ }
ρD (d)
24
(b) kȳk∗ ≤ ky 0 k∗ +
kc − At y 0 k∗ + kAk
ρD (d)
1
1
kc − At y 0 k∗ + kAk
2. (a)
≤
1
+
dist(ȳ, rel∂CY∗ )
dist(y 0 , rel∂CY∗ )
ρD (d)
!
1
(1 + ε) max{1, kAk}
kc − At y 0 k∗ + kAk
(b)
≤
1
+
dist((ȳ, ū), rel∂Yd )
min {dist(y 0 , rel∂CY∗ ), τR∗ }
ρD (d)
kȳ − y 0 k∗
1
3. (a)
≤
∗
0
dist(ȳ, rel∂CY )
dist(y , rel∂CY∗ )
kc − At y 0 k∗ + kAk
max{1, ky 0 k∗ }
ρD (d)
kȳ − y 0 k∗
≤
(b)
dist((ȳ, ū), rel∂Yd )
(1 + ε) max{1, kAk}
min {dist(y 0 , rel∂CY∗ ), τR∗ }
!
kc − At y 0 k∗ + kAk
max{1, ky 0 k∗ }
ρD (d)
kȳk∗
1
kc − At y 0 k∗ + kAk
0
(c)
≤
ky
k
+
∗
dist(ȳ, rel∂CY∗ )
dist(y 0 , rel∂CY∗ )
ρD (d)
(d)
!
!
!
kȳk∗
≤
dist((ȳ, ū), rel∂Yd )
(1 + ε) max{1, kAk}
kc − At y 0 k∗ + kAk
0
ky
k
+
∗
min {dist(y 0 , rel∂CY∗ ), τR∗ }
ρD (d)
!
The statement of Theorem 10 below relaxes the assumptions on R and CY not being
linear subspaces:
Theorem 10 Consider any y 0 ∈ CY∗ . If ρD (d) > 0 then for any ε > 0, there exists
(ȳ, ū) ∈ Yd with the following properties:
• If CY is not a subspace, (ȳ, ū) satisfies all items of Theorem 9.
• If CY is a subspace and R is not a subspace, (ȳ, ū) satisfies all items of Theorem
9, where items 2.(a), 3.(a), and 3.(c) are vacuously valid as both sides of these
inequalities are zero.
• If CY and R are subspaces, (ȳ, ū) satisfies items 1.(a), 1.(b), 2.(a), 3.(a), and 3.(c)
of Theorem 9, where items 2.(a), 3.(a), and 3.(c) are vacuously valid as both sides
of these inequalities are zero. The point (ȳ, ū) also satisfies
1
≤ε
dist((ȳ, ū), rel∂Yd )
kȳ − y 0 k∗
3’.(b)
≤ε
dist((ȳ, ū), rel∂Yd )
2’.(b)
25
3’.(d)
kȳk∗
≤ε
dist((ȳ, ū), rel∂Yd )
We conclude this subsection by presenting a result which captures the thrust of
Theorems 9 and 10, emphasizing how the distance to dual infeasibility ρD (d) and the
geometric properties of a given point y 0 ∈ CY∗ bound various geometric properties of the
dual feasible region Yd . For y 0 ∈ relintCY∗ , define:
gCY∗ ,R∗ (y 0 ) :=
max{ky 0 k∗ , 1}
.
min{1, dist(y 0 , rel∂CY∗ ), τR∗ }
We now define a geometric measure for the dual feasible region. We do not consider
the whole set Yd ; instead we consider only the projection onto the variables y. Let ΠYd
denote the projection of Yd onto the space of the y variables:
ΠYd := {y ∈ IRm | there exists u ∈ IR for which (y, u) ∈ Yd } .
(27)
Note that the set ΠYd corresponds exactly to the feasible region in the alternate formulation of the dual problem (15). We define the following geometric measure of the set
ΠYd :
(
)
kyk∗
1
gYd := inf max kyk∗ ,
,
.
(y,u)∈Yd
dist(y, rel∂ΠYd ) dist(y, rel∂ΠYd )
Corollary 3 Consider any y 0 ∈ CY∗ . If ρD (d) > 0 then
gYd
kc − At y 0 k∗ + kAk
≤ max{1, kAk}gCY∗ ,R∗ (y ) 1 +
ρD (d)
0
!
.
Proof: We show in Lemma 4, item 4, that for any (ȳ, ū) ∈ Yd , dist(ȳ, rel∂ΠYd ) ≥
dist((ȳ, ū), rel∂Yd ). If either CY or R is not a subspace, use items 1.(b), 2.(b), and 3.(d)
from Theorem 9 and apply the definition of gYd to obtain
gYd
kc − At y 0 k∗ + kAk
≤ (1 + ε) max{1, kAk}gCY∗ ,R∗ (y ) 1 +
ρD (d)
0
!
.
Since now the left side is independent of ε, take the limit as ε → 0. If both CY and R
are subspaces we obtain the stronger bound
gYd
kc − At y 0 k∗ + kAk
≤ gCY∗ ,R∗ (y ) 1 +
ρD (d)
!
0
by using item 1.(b) from Theorem 9, items 2’.(b) and 3’.(d) from Theorem 10, and the
definition of gYd .
26
We now state Lemma 4, we start by defining the following set:
n
Yed := (y, u) ∈ IRm × IR | (c − At y, u) ∈ C ∗
o
.
(28)
Note that the dual feasible region Yd is recovered from Yed as Yd = Yed ∩ (CY∗ × IR). The
following lemma, whose proof is deferred to the Appendix, relates a variety of distances
to relative boundaries of sets arising in the dual problem:
Lemma 4 Given a dual feasible point (y, u) ∈ Yd , let s = c − At y ∈ effdom u(·). Then:
1. dist ((y, u), rel∂(CY∗ × IR)) = dist (y, rel∂CY∗ ) .
2. dist (y, u), rel∂ Yed ≥
3. dist ((y, u), rel∂Yd ) ≥
1
dist((s, u), rel∂C ∗ )
max{1,kAk}
1
max{1,kAk}
.
min {dist ((s, u), rel∂C ∗ ) , dist (y, rel∂CY∗ )} .
4. dist(y, rel∂ΠYd ) ≥ dist((y, u), rel∂Yd ) .
We now proceed with the proofs of the results of this subsection.
Proof of Proposition 6: If y = 0 the result is true. If y 6= 0, then Proposition 9
shows that there exists ŷ such that kŷk = 1 and ŷ t y = kyk∗ . For any ε > 0, define the
following perturbed problem instance:
1
((−bt y + u)+ + ε)
ŷct ,
b̄ = b +
ŷ,
c̄ = c .
kyk∗
kyk∗
We note that, for the data d¯ = (Ā, b̄, c̄), the point (y, u) satisfies (A2d¯) in Lemma 1, and
¯ which implies
therefore (GPd¯) is infeasible. We conclude that ρP (d) ≤ kd − dk,
Ā = A −
ρP (d) ≤
and so
ρP (d) ≤
max {kck∗ , (−bt y + u)+ + ε}
kyk∗
max {kck∗ , −(bt y − u)}
.
kyk∗
The following technical lemma, which concerns the optimization problem (DP ) below, is used in the subsequent proofs. Problem (DP ) is parameterized by given points
y 0 ∈ CY∗ and s0 ∈ R∗ , and is defined by
(DP ) maxy,δ,s,θ θ
s.t.
−At y + δc − s = θ (At y 0 − c + s0 )
kyk∗ + |δ| ≤ 1
y ∈ CY∗
δ≥0
s ∈ R∗ .
27
(29)
Lemma 5 Consider any y 0 ∈ CY∗ and s0 ∈ R∗ such that At y 0 + s0 6= c. If ρD (d) > 0,
then there exists a point (y, δ, s, θ) feasible for problem (DP ) that satisfies
θ≥
ρD (d)
>0.
kc − At y 0 − s0 k∗
(30)
Proof: Note that problem (DP ) is feasible for any y 0 and s0 since (y, δ, s, θ) = (0, 0, 0, 0)
is always feasible. Therefore it can either be unbounded or have a finite optimal objective
value. If (DP ) is unbounded, we can find feasible points with an objective function large
enough such that (30) holds. If (DP ) has a finite optimal value, say θ∗ , then it follows
from elementary arguments that it attains this value. Since ρD (d) > 0 implies Yd 6= ∅,
Theorem 6 implies that the optimal solution (y ∗ , δ ∗ , s∗ , θ∗ ) for (DP ) satisfies (30).
Proof of Proposition 7: Assume c − At y 0 6∈ relintR∗ , otherwise from Proposition 1,
the point (ȳ, ū) = (y 0 , u(c − At y 0 )) satisfies the assertion of the proposition. We consider
problem (DP ), defined by (29), with y 0 and s0 ∈ relintR∗ such that kc − At y 0 − s0 k ≤
dist(c−At y 0 , R∗ )+ε. From Lemma 5 we have that there exists a point (y, δ, s, θ) feasible
for (DP ) that satisfies
θ≥
ρD (d)
ρD (d)
≥
.
t
0
0
kc − A y − s k
dist(c − At y 0 , R∗ ) + ε
Define
s + θs0
y + θy 0
and s̄ =
.
δ+θ
δ+θ
By construction we have ȳ ∈ CY∗ , c − At ȳ = s̄ ∈ relintR∗ . Therefore from Proposition 1
(ȳ, u(c − At ȳ)) ∈ Yd , and letting ξ = max{1, ky 0 k∗ } we have
ȳ =
kȳ − y 0 k∗ =
ky − δy 0 k∗
(kyk∗ + δ)ξ
dist(c − At y 0 , R∗ ) + ε
≤
≤
ξ.
δ+θ
θ
ρD (d)
Proof of Theorem 9: Note that ρD (d) > 0 implies Yd 6= ∅; note also that ρD (d) is
finite, otherwise Proposition 2 shows that R = {0} which is a subspace. Set s0 ∈ R∗
0
∗)
such that ks0 k∗ = kAk and τR∗ = dist(sks,rel∂R
. We also assume for now that c − At y 0 6=
0k
∗
s0 . We show later in the proof how to handle the case when c − At y 0 = s0 . Denote
ry0 = dist(y 0 , rel∂CY∗ ), and rs0 = dist(s0 , rel∂R∗ ) = τR∗ kAk > 0.
With the points y 0 and s0 , use Lemma 5 to obtain a point (y, δ, s, θ) feasible for (DP )
such that from inequality (30) satisfies
0<
1
kc − At y 0 k∗ + kAk
≤
.
θ
ρD (d)
28
(31)
Define the following:
ȳ =
y + θy 0
,
δ+θ
s̄ =
s + θs0
,
δ+θ
rȳ =
θry0
,
δ+θ
rs̄ =
θrs0
.
δ+θ
By construction dist(ȳ, rel∂CY∗ ) ≥ rȳ , dist(s̄, rel∂R∗ ) ≥ rs̄ , and c − At ȳ = s̄. Therefore, from Proposition 1 the point (ȳ, u(s̄)) ∈ Yd . We now choose ū so that (ȳ, ū) ∈ Yd
and bound its distance to the relative boundary. Since relint R∗ ⊆ effdom
u(·),
from
rs̄
Proposition 11 and Proposition 12, we have that for any ε > 0, the ball B s̄, 1+ε ∩LR∗ ⊂
relint effdom u(·). Define the function µ̃(·, ·) by
µ̃(s̄, κ) :=
1
κ
kAk
+ sups u(s)
.
ks − s̄k∗ ≤ κ
s ∈ R∗
Note that µ̃(·, ·) is finite for every s̄ ∈ relint effdom u(·) and κ ∈ [0, dist(s̄, rel∂R∗ )),
because it is defined as the supremum of the continuous function u(·) over a closed and
bounded subset contained in the relative
interior of its effective domain, see Theorem
rs̄
rs̄
, and since ū ≥ kAk(1+ε)
+ u(s̄) ≥ u(s̄) the
10.1 of [18]. We define ū = µ̃ s̄, 1+ε
point (ȳ, ū) ∈ Yd . Let us now bound dist((ȳ, ū), rel∂Yd ). Consider any vector v ∈
LCY∗ ∩ {y| − At y ∈ LR∗ } such that kvk∗ ≤ 1, then
ȳ + αv ∈ CY∗ for any |α| ≤ rȳ ,
and
c − At (ȳ + αv) = s̄ + α(−At v) ∈ B s̄,
rs̄
rs̄
∩ LR∗ for any |α| ≤
.
1+ε
kAk(1 + ε)
This last inclusion implies that (c − At (ȳ + αv), ū + β) = (s̄ + α(−At v), ū + β) ∈ C ∗
rs̄
for any |α|, |β| ≤ kAk(1+ε)
. We have shown that dist(ȳ, rel∂CY∗ ) ≥ rȳ and dist((c −
rs̄
At ȳ, ū), rel∂C ∗ ) ≥ kAk(1+ε)
. Therefore item 3 of Lemma 4 implies
(
)
1
rs̄
dist((ȳ, ū), rel∂Yd ) ≥
min rȳ ,
max{1, kAk}
kAk(1 + ε)
(
)
1
θ
rs0
≥
min ry0 ,
(1 + ε) max{1, kAk} δ + θ
kAk
θ min{ry0 , τR∗ }
=
.
(1 + ε) max{1, kAk}(δ + θ)
To finish the proof, we bound the different expressions in the statement of the theorem; let ξ = max{1, kAk} to simplify notation. Here we use inequality (31):
29
1. (a) kȳ − y 0 k∗ =
ky − δy 0 k∗
max{1, ky 0 k∗ }
kc − At y 0 k∗ + kAk
≤
≤
max{1, ky 0 k∗ } .
δ+θ
θ
ρD (d)
1
1
kc − At y 0 k∗ + kAk
(b) kȳk∗ ≤ kyk∗ + ky 0 k∗ ≤ + ky 0 k∗ ≤ ky 0 k∗ +
.
θ
θ
ρD (d)
1
1
δ+θ
1
1
1
2. (a)
≤
=
≤
1+
≤
∗
dist(ȳ, rel∂CY )
rȳ
θry0
ry 0
θ
ry 0
kc − At y 0 k∗ + kAk
1+
.
ρD (d)
!
1
(1 + ε)ξ δ + θ
(1 + ε)ξ
1
(b)
≤
≤
1+
∗
∗
dist((ȳ, ū), rel∂Yd )
min{ry0 , τR } θ
min{ry0 , τR }
θ
!
t 0
(1 + ε)ξ
kc − A y k∗ + kAk
1+
.
≤
min{ry0 , τR∗ }
ρD (d)
3. (a)
(b)
kȳ − y 0 k∗
ky − δy 0 k∗
1
≤
≤
max{1, ky 0 k∗ }
∗
dist(ȳ, rel∂CY )
θry0
ry 0 θ
1 kc − At y 0 k∗ + kAk
≤
max{1, ky 0 k∗ } .
ry 0
ρD (d)
kȳ − y 0 k∗
(1 + ε)ξky − δy 0 k∗
(1 + ε)ξ 1
≤
≤
max{1, ky 0 k∗ }
dist((ȳ, ū), rel∂Yd )
θ min{ry0 , τR∗ }
min{ry0 , τR∗ } θ
(1 + ε)ξ kc − At y 0 k∗ + kAk
≤
max{1, ky 0 k∗ } .
min{ry0 , τR∗ }
ρD (d)
ky + θy 0 k∗
1
1
kȳk∗
(c)
≤
≤
ky 0 k∗ +
∗
dist(ȳ, rel∂CY )
θry0
ry 0
θ
!
t 0
1
kc − A y k∗ + kAk
0
≤
ky k∗ +
.
ry 0
ρD (d)
kȳk∗
ky + θy 0 k∗ (1 + ε)ξ
(1 + ε)ξ
1
≤
≤
ky 0 k∗ +
dist((ȳ, ū), rel∂Yd )
θ min{ry0 , τR∗ }
min{ry0 , τR∗ }
θ
!
t 0
(1 + ε)ξ
kc − A y k∗ + kAk
≤
ky 0 k∗ +
.
min{ry0 , τR∗ }
ρD (d)
(d)
∗ kAk
For the case c − At y 0 = s0 , define ȳ = y 0 and ū = µ̃ s0 , τR1+ε
. The proof then
proceeds exactly as above, except that now we show that dist((c − At ȳ, ū), rel∂C ∗ ) ≥
τR∗
1
, which implies that dist((ȳ, ū), rel∂Yd ) ≥ max{1,kAk}(1+ε)
min{τR∗ , ry0 } from item 3 of
1+ε
Lemma 4. This inequality is then used to prove each item in the theorem.
Finally, we note that Theorem 10 can be proved using almost identical arguments as
in the proof of Theorem 9, but with a careful analysis to handle the special cases when
R or CY are subspaces, see [12] for the exact details.
30
6
Sensitivity under Perturbation
In this section we present several results that bound the deformation of primal and dual
feasible regions and objective function values under data perturbation. All proofs are
deferred to the end of the section.
The following two theorems bound the deformation of the primal and dual feasible
regions under data perturbation. These results are essentially extensions of Assertion 2
of Theorem 1.1 of [16] to the primal and dual problems in the GSM format.
Theorem 11 Suppose that ρP (d) > 0. Let ∆d = (∆A, ∆b, ∆c) be such that Xd+∆d 6= ∅
and consider any x0 ∈ Xd+∆d . Then there exists x̄ ∈ Xd satisfying
kx̄ − x0 k ≤ (k∆bk + k∆Akkx0 k)
max{1, kx0 k}
.
ρP (d)
Theorem 12 Suppose that ρD (d) > 0. Let ∆d = (∆A, ∆b, ∆c) be such that Yd+∆d 6= ∅
and consider any (y 0 , u0 ) ∈ Yd+∆d . Then for any ε > 0, there exists (ȳ, ū) ∈ Yd satisfying
kȳ − y 0 k∗ ≤ (k∆ck∗ + k∆Akky 0 k∗ + ε)
max {1, ky 0 k∗ }
.
ρD (d)
The next two results bound changes in optimal objective function values under data
perturbation. Proposition 8 and Theorem 13 below respectively extend to the ground-set
model format Lemma 3.9 and Assertion 5 of Theorem 1.1 in [16].
Proposition 8 Suppose that d ∈ F and ρ(d) > 0. Let ∆d = (0, ∆b, 0) be such that
Xd+∆d 6= ∅. Then,
z∗ (d + ∆d) − z∗ (d) ≥ −k∆bk
max{kck∗ , −z∗ (d)}
.
ρP (d)
Theorem 13 Suppose that d ∈ F and ρ(d) > 0. Let ∆d = (∆A, ∆b, ∆c) satisfy k∆dk <
ρ(d). Then, if x∗ and x̂ are optimal solutions for (GPd ) and (GPd+∆d ) respectively,
|z∗ (d + ∆d) − z∗ (d)| ≤ k∆bk
max{kck∗ + k∆ck∗ , −z∗ (d)}
+
ρP (d) − k∆dk
31
!
max{kck∗ + k∆ck∗ , −z∗ (d)}
max{kx∗ k, kx̂k} .
+ k∆ck∗ + k∆Ak
ρP (d) − k∆dk
Proof of Theorem 11: We consider problem (P P ), defined by (22), with x0 = x0 and
w0 such that (A + ∆A)x0 − (b + ∆b) = w0 ∈ CY . From Lemma 3 we have that there
exists a point (x, t, w, θ) feasible for (P P ) that satisfies
θ≥
ρP (d)
ρP (d)
ρP (d)
=
≥
.
kb − Ax0 + w0 k
k∆Ax0 − ∆bk
k∆Akkx0 k + k∆bk
We define
x + θx0
w + θw0
, w̄ =
.
t+θ
t+θ
By construction we have that x̄ ∈ P , Ax̄ − b = w̄ ∈ CY , therefore x̄ ∈ Xd , and
x̄ =
kx − tx0 k
(kxk + t) max{1, kx0 k}
k∆Akkx0 k + k∆bk
≤
≤
max{1, kx0 k} .
t+θ
θ
ρP (d)
kx̄−x0 k =
Proof of Theorem 12: From Proposition 11 we have that for any ε > 0 there exists
ξ 6= ∆At y 0 −∆c such that kξk∗ ≤ ε and c+∆c+ξ −(A+∆A)t y 0 ∈ relintR∗ . We consider
problem (DP ) defined by (29), with y 0 = y 0 and s0 := c+∆c+ξ−(A+∆A)t y 0 ∈ relintR∗ .
From Lemma 5 we have that there exists a point (y, δ, s, θ) feasible for (DP ) that satisfies
θ≥
ρD (d)
ρD (d)
ρD (d)
=
≥
.
t
0
0
t
0
kc − A y − s k∗
k∆A y − ∆c − ξk∗
k∆ck∗ + k∆Akky 0 k∗ + ε
We define
y + θy 0
s + θs0
, s̄ =
.
δ+θ
δ+θ
By construction we have that ȳ ∈ CY∗ and c − At ȳ = s̄ ∈ relintR∗ ⊆ effdom u(·), from
Proposition 11 and Proposition 12. Therefore from Proposition 1, (ȳ, u(c − At ȳ)) ∈ Yd
and
ȳ =
kȳ−y 0 k∗ =
ky − δy 0 k∗
(kyk∗ + δ) max{1, ky 0 k∗ }
k∆ck∗ + k∆Akky 0 k∗ + ε
≤
≤
max{1, ky 0 k} .
δ+θ
θ
ρD (d)
Proof of Proposition 8: The hypothesis that ρ(d) > 0 implies that the GSM format
problem with data d has zero duality gap and (GPd ) and (GDd ) attain their optimal
values, see Corollary 1. Also, since Yd+∆d = Yd 6= ∅ has a Slater point (since ρD (d) > 0),
and Xd+∆d 6= ∅, then (GPd+∆d ) and (GDd+∆d ) have no duality gap and (GPd+∆d ) attains
32
its optimal value, see Theorem 2. Let (y, u) ∈ Yd be an optimal solution of (GDd ), due
to the form of the perturbation, point (y, u) ∈ Yd+∆d , and therefore
z ∗ (d + ∆d) ≥ (b + ∆b)t y − u = z ∗ (d) + ∆bt y ≥ z ∗ (d) − k∆bkkyk∗ .
The result now follows using the bound on the norm of dual feasible solutions from
Proposition 6 and the strong duality for data instances d and d + ∆d.
Proof of Theorem 13: The hypothesis that ρ(d) > 0 and ρ(d + ∆d) > 0 imply that
the GSM format problems with data d and d + ∆d both have zero duality gap and all
problems attain their optimal values, see Corollary 1.
Let x̂ ∈ Xd+∆d be an optimal solution for (GPd+∆d ). Define the perturbation ∆d˜ =
(0, ∆b − ∆Ax̂, 0). Then by construction the point x̂ ∈ Xd+∆d˜. Therefore
˜ .
z∗ (d + ∆d) = (c + ∆c)t x̂ ≥ −k∆ck∗ kx̂k + ct x̂ ≥ −k∆ck∗ kx̂k + z∗ (d + ∆d)
Invoking Proposition 8, we bound the optimal objective function value for the problem
˜
instance d + ∆d:
∗
˜ ≥ z∗ (d) − k∆b − ∆Ax̂k max{kck∗ , −z (d)} .
z∗ (d + ∆d) + k∆ck∗ kx̂k ≥ z∗ (d + ∆d)
ρP (d)
Therefore
max{kck∗ , −z ∗ (d)}
z∗ (d + ∆d) − z∗ (d) ≥ −k∆ck∗ kx̂k − (k∆bk + kAkkx̂k)
.
ρP (d)
Changing the roles of d and d + ∆d we can construct the following upper bound:
z∗ (d + ∆d) − z∗ (d) ≤ k∆ck∗ kx∗ k + (k∆bk + kAkkx∗ k)
max{kc + ∆ck∗ , −z ∗ (d + ∆d)}
,
ρP (d + ∆d)
where x∗ ∈ Xd is an optimal solution for (GPs ). The value −z ∗ (d + ∆d) can be replaced
by −z ∗ (d) on the right side of the previous bound. To see this consider two cases. If
−z ∗ (d + ∆d) ≤ −z ∗ (d), then we can do the replacement since it yields a larger bound. If
−z ∗ (d + ∆d) > −z ∗ (d), the inequality above has a negative left side and a positive right
side after the replacement. Note also that because of the hypothesis k∆dk < ρ(d), the
distance to infeasibility satisfies ρP (d + ∆d) ≥ ρP (d) − k∆dk > 0. We finish the proof
combining the previous two bounds, incorporating the lower bound on ρP (d + ∆d), and
using strong duality of the data instances d and d + ∆d.
7
Concluding Remarks
We have shown herein that most of the essential results regarding condition numbers for
conic convex optimization problems can be extended to the non-conic ground-set model
33
format (GPd ). We have attempted herein to highlight the most important and/or useful
extensions; for other results see [12].
It is interesting to note the absence of results that directly bound z∗ (d) or the norms
of optimal solutions kx∗ k, ky ∗ k of (GPd ) and (GDd ) as in Assertions 3 and 4 of Theorem
1.1 of [16]. Such bounds are very important in relating the condition number theory
to the complexity of algorithms. However, we do not believe that such bounds can
be demonstrated for (GPd ) without further assumptions. The reason for this is subtle
yet simple. Observe from Theorem 6 that ρD (d) depends only on d = (A, b, c), CY ,
and the recession cone R of P . That is, P only affects ρD (d) through its recession
cone, and so information about the “bounded” portion of P is irrelevant to the value of
ρD (d). For this reason it is not possible to bound the norm of primal optimal solutions x
directly, and hence one cannot bound z∗ (d) directly either. Under rather mild additional
assumptions, it is possible to analyze the complexity of algorithms for solving (GPd ),
see [12] as well as a forthcoming paper on this topic.
Note that the characterization results for ρP (d) and ρD (d) presented herein in Theorems 5 and 6 pertain only to the case when d ∈ F. A characterization of ρ(d) for d ∈
/F
is the subject of future research.
8
Appendix
This appendix contains supporting mathematical results that are used in the proofs of
the results of this paper. We point the reader to existing proofs for the most well known
results.
Proposition 9 (Proposition 2 of [8]) Let X be an n-dimensional normed vector space
with dual space X ∗ . For every x ∈ X, there exists x̄ ∈ X ∗ with the property that kx̄k∗ = 1
and kxk = x̄t x.
Proposition 10 (Theorems 11.1 and 11.3 of [18]) Given two non-empty convex sets
S and T in IRn , then relint S ∩ relint T = ∅ if and only if S and T can be properly
separated, i.e., there exists y 6= 0 such that
inf y t x ≥ sup y t z
x∈S
z∈T
t
sup y x > inf y t z .
x∈S
z∈T
34
The following is a restatement of Corollary 14.2.1 of [18], which relates the effective
domain of u(·) of (14) to the recession cone of P , where recall that R∗ denotes the dual
of the recession cone R defined in (8).
Proposition 11 (Corollary 14.2.1 of [18]) Let R denote the recession cone of the nonempty
convex set P , and define u(·) by (14). Then cl effdom u(·) = R∗ .
Proposition 12 (Theorem 6.3 of [18]) For any convex set Q ⊆ IRn , cl relint Q = cl Q,
and relint cl Q = relint Q.
The following lemma is central in relating the two alternative characterizations of
the distance to infeasibility and is used in the proofs in Section 4.
Lemma 6 Consider two closed convex cones C ⊆ IRn and CY ⊆ IRm , and data (M, v) ∈
IRm×n × IRm . Strong duality holds between
(P ) : z∗ = min kM t y + qk∗
s.t. y t v ≥ 1
y ∈ CY∗
q ∈ C∗
and (D) : z ∗ = max θ
s.t. M x − θv ∈ CY
kxk ≤ 1
θ≥0
x∈C .
Proof: The proof that weak duality holds between (P ) and (D) is straightforward,
therefore z ∗ ≤ z∗ . Assume z ∗ < z∗ , and set ε > 0 such that 0 ≤ z ∗ < z∗ − ε. Consider
the following nonempty convex set S:
n
S := (u, δ, α) | ∃y, q s.t. y + u ∈ CY∗ , q + δ ∈ C ∗ , y t v ≥ 1 − α, kM t y + qk∗ ≤ z∗ − ε
o
.
Then (0, 0, 0) ∈
/ S, and from Proposition 10 there exists (z, x, θ) 6= 0 such that z t u +
xt δ + θα ≥ 0 for any (u, δ, α) ∈ S. For any y ∈ IRm , ũ ∈ CY∗ , δ̃ ∈ C ∗ , π ≥ 0, and q̃ such
that kq̃k∗ ≤ z∗ − ε, define q = −M t y + q̃, u = −y + ũ, δ = −q + δ̃, and α = 1 − y t v + π.
This construction implies that the point (u, δ, α) ∈ S, and that for all y, ũ ∈ CY∗ ,
δ̃ ∈ C ∗ , π ≥ 0, and kq̃k∗ ≤ z∗ − ε it holds that:
0 ≤ z t (−y + ũ) + xt (M t y − q̃ + δ̃) + θ(1 − y t v + π)
= y t (M x − θv − z) + z t ũ + xt δ̃ − xt q̃ + θ + θπ .
This implies that M x − θv = z ∈ CY , x ∈ C, θ ≥ 0, and θ ≥ xt q̃ for kq̃k∗ ≤ z∗ − ε. If
x 6= 0, re-scale (z, x, θ) such that kxk = 1 and then (x, θ) is feasible for (D). Set q̃ =
(z∗ − ε)q̂, where q̂ is given by Proposition 9 and is such that kq̂k∗ = 1 and q̂ t x = kxk = 1.
It then follows that z ∗ ≥ θ ≥ xt q̃ = z∗ − ε > z ∗ , which is a contradiction.
35
If x = 0, the above expression implies −θv = z ∈ CY , and θ ≥ 0. If θ > 0 then
−v ∈ CY , which means that the point (0, β) is feasible for (D) for any β ≥ 0, implying
that z ∗ = ∞, a contradiction since z ∗ < z∗ . If θ = 0, then z = 0, which is a contradiction
since (z, x, θ) 6= 0.
The next two lemmas concern properties of the distance to the relative boundary of
a convex set.
Lemma 7 Given convex sets A and B, and a point x ∈ A ∩ B, then
dist(x, rel∂(A ∩ B)) ≥ min {dist(x, rel∂A), dist(x, rel∂B)} .
Proof: The proof of this lemma is based in showing that LA∩B \ (A ∩ B) ⊂ (LA \ A) ∪
(LB \ B). If this inclusion is true then
dist(x, rel∂(A ∩ B)) =
≥
inf
x̄∈LA∩B \(A∩B)
kx − x̄k
inf
x̄∈(LA \A)∪(LB \B)
kx − x̄k
(
= min
)
inf
x̄∈LA \A
kx − x̄k,
inf
x̄∈LB \B
kx − x̄k
= min {dist(x, rel∂A), dist(x, rel∂B)} ,
which proves the lemma. Therefore we now prove the inclusion. Consider some x̄ ∈
P
LA∩B , this means that there exists αi ∈ IR, xi ∈ A∩B, i ∈ I a finite set, and i∈I αi = 1,
P
such that x̄ = i∈I αi xi . Since xi ∈ A and xi ∈ B, we have that x̄ ∈ LA and x̄ ∈ LB .
Therefore LA∩B ⊂ LA ∩ LB . The desired inclusion is then obtained with a little algebra:
LA∩B \ (A ∩ B) ⊂ LA ∩ LB ∩ (A ∩ B)C
= L A ∩ L B ∩ AC ∪ B C
=
L A ∩ L B ∩ AC ∪ L A ∩ L B ∩ B C
⊂
L A ∩ AC ∪ L B ∩ B C
= (LA \ A) ∪ (LB \ B) .
Last of all, we have:
Proof of Lemma 4: Equality (1.) is a consequence of the fact that (y, u) ∈ LCY∗ ×IR \
(CY∗ × IR) if and only if y ∈ LCY∗ \ CY∗ , and that k(y, u) − (ȳ, u)k∗ = ky − ȳk∗ + |u − u| =
ky − ȳk∗ .
To prove inequality (2.), we first show that if (ȳ, ū) ∈ LYed \ Yed , then (c − At ȳ, ū) ∈
LC ∗ \ C ∗ . First note that if (ȳ, ū) 6∈ Yed then, by the definition of Yed , (c − At ȳ, ū) 6∈ C ∗ ;
36
therefore we only need to show that if (ȳ, ū) ∈ LYed then (c − At ȳ, ū) ∈ LC ∗ . Let
P
(ȳ, ū) ∈ LYed . Then there exists αi ∈ IR, (yi , ui ) ∈ Yed , i ∈ I a finite set, and i∈I αi = 1,
P
such that (ȳ, ū) = i∈I αi (yi , ui ). Consider
(c − At ȳ, ū) = (c − At
X
αi yi ,
i∈I
X
αi ui ) =
i∈I
X
αi (c − At yi , ui ) .
i∈I
Since (yi , ui ) ∈ Yed then (c − At yi , ui ) ∈ C ∗ and therefore (c − At ȳ, ū) ∈ LC ∗ .
The inclusion above means that
dist((s, u), rel∂C ∗ ) =
≤
inf
k(s, u) − (s̄, ū)k∗
inf
k(s, u) − (c − At ȳ, ū)k∗
(s̄,ū)∈LC ∗ \C ∗
(ȳ,ū)∈L
Yd
e
=
Yd
e
Yd
e
max{kAt ȳ − At yk∗ , |u − ū|}
ed
\Y
max{kAkkȳ − yk∗ , |u − ū|}
inf
(ȳ,ū)∈L
Yd
e
≤
ed
\Y
inf
(ȳ,ū)∈L
≤
max{ks − (c − At ȳ)k∗ , |u − ū|}
inf
(ȳ,ū)∈L
=
ed
\Y
ed
\Y
max{kAk, 1} max{ky − ȳk∗ , |u − ū|}
inf
(ȳ,ū)∈L
Yd
e
ed
\Y
≤ max{kAk, 1}
(ky − ȳk∗ + |u − ū|)
inf
(ȳ,ū)∈L
Yd
e
= max{kAk, 1}
ed
\Y
k(y, u) − (ȳ, ū)k∗
inf
(ȳ,ū)∈L
Yd
e
ed
\Y
= max{kAk, 1}dist((y, u), rel∂ Yed ) .
Inequality (3.) follows from the observation that Yd = Yed ∩ (CY∗ × IR), Lemma 7, and
the bounds obtained in (1.) and (2.).
The proof of item (4.) uses the fact (soon to be proved) that if ȳ ∈ LΠYd \ ΠYd then
for any ū, (ȳ, ū) ∈ LYd \ Yd . Then from the definition of the relative distance to the
boundary we have
dist ((y, u), rel∂Yd ) =
≤
=
=
inf
(ȳ,ū)∈LYd \Yd
k(y, u) − (ȳ, ū)k∗
inf
k(y, u) − (ȳ, ū)k∗
inf
ky − ȳk∗ + |u − ū|
ȳ∈LΠYd \ΠYd ,ū
ȳ∈LΠYd \ΠYd ,ū
inf
ȳ∈LΠYd \ΠYd
ky − ȳk∗
= dist (y, rel∂ΠYd ) ,
37
which proves inequality (4.). To finish the proof we now show that if ȳ ∈ LΠYd \ ΠYd
then for any ū, (ȳ, ū) ∈ LYd \ Yd . The fact that ȳ 6∈ ΠYd implies that for any ū,
(ȳ, ū) 6∈ Yd . Now since ȳ ∈ LΠYd , there exists αi ∈ IR, (yi , ui ) ∈ Yd , i ∈ I a finite set
P
P
such that ȳ = i∈I αi yi and i∈I αi = 1. Since for any (y, u) ∈ Yd and β ≥ 0 the point
(y, u + β) ∈ Yd , we can express the point (ȳ, ū) by the following sum of points in Yd
!
(ȳ, ū) =
X
αi yi + y − y,
i∈I
X
αi ui + u + β1 − u − β2
,
i∈I
where for any ū, β1 = (ū − i∈I αi ui )+ and β2 = (ū −
any ū, (ȳ, ū) ∈ LYd , completing the proof.
P
P
i∈I
αi ui )− . This shows that for
References
[1] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear Programming, Theory
and Algorithms. John Wiley & Sons, Inc, New York, second edition, 1993.
[2] F. Cucker and J. Peña. A primal-dual algorithm for solving polyhedral conic systems
with a finite-precision machine. Technical report, GSIA, Carnegie Mellon University,
2001.
[3] M. Epelman and R. M. Freund. A new condition measure, preconditioners, and
relations between different measures of conditioning for conic linear systems. SIAM
Journal on Optimization, 12(3):627–655, 2002.
[4] S. Filipowski. On the complexity of solving sparse symmetric linear programs specified with approximate data. Mathematics of Operations Research, 22(4):769–792,
1997.
[5] S. Filipowski. On the complexity of solving feasible linear programs specified with
approximate data. SIAM Journal on Optimization, 9(4):1010–1040, 1999.
[6] R. M. Freund and J. R. Vera. Condition-based complexity of convex optimization
in conic linear form via the ellipsoid algorithm. SIAM Journal on Optimization,
10(1):155–176, 1999.
[7] R. M. Freund and J. R. Vera. On the complexity of computing estimates of condition
measures of a conic linear system. Technical Report, Operations Research Center,
MIT, August 1999.
[8] R. M. Freund and J. R. Vera. Some characterizations and properties of the “distance
to ill-posedness” and the condition measure of a conic linear system. Mathematical
Programming, 86(2):225–260, 1999.
38
[9] J. L. Goffin. The relaxation method for solving systems of linear inequalities. Mathematics of Operations Research, 5(3):388–414, 1980.
[10] M. A. Nunez and R. M. Freund. Condition measures and properties of the central
trajectory of a linear program. Mathematical Programming, 83(1):1–28, 1998.
[11] M. A. Nunez and R. M. Freund. Condition-measure bounds on the behavior of
the central trajectory of a semi-definite program. SIAM Journal on Optimization,
11(3):818–836, 2001.
[12] F. Ordóñez. On the Explanatory Value of Condition Numbers for Convex Optimization: Theoretical Issues and Computational Experience. PhD thesis, Massachusetts
Institute of Technology, 2002.
[13] F. Ordóñez and R. M. Freund. Computational experience and the explanatory
value of condition measures for linear optimization. Working Paper OR361-02,
MIT, Operations Research Center, 2002.
[14] J. Peña. Computing the distance to infeasibility: theoretical and practical issues.
Technical report, Center for Applied Mathematics, Cornell University, 1998.
[15] J. Peña and J. Renegar. Computing approximate solutions for convex conic systems
of constraints. Mathematical Programming, 87(3):351–383, 2000.
[16] J. Renegar. Some perturbation theory for linear programming. Mathematical Programming, 65(1):73–91, 1994.
[17] J. Renegar. Linear programming, complexity theory, and elementary functional
analysis. Mathematical Programming, 70(3):279–351, 1995.
[18] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, New
Jersey, 1997.
[19] J. R. Vera. Ill-posedness and the computation of solutions to linear programs with
approximate data. Technical Report, Cornell University, May 1992.
[20] J. R. Vera. Ill-Posedness in Mathematical Programming and Problem Solving with
Approximate Data. PhD thesis, Cornell University, 1992.
[21] J. R. Vera. Ill-posedness and the complexity of deciding existence of solutions to
linear programs. SIAM Journal on Optimization, 6(3):549–569, 1996.
[22] J. R. Vera. On the complexity of linear programming under finite precision arithmetic. Mathematical Programming, 80(1):91–123, 1998.
39
Download