MIT Sloan School of Management

MIT Sloan School of Management Working Paper 4286-03 February 2003 On an Extension of Condition Number Theory to Non-Conic Convex Optimization Robert M. Freund and Fernando Ordóñez © 2003 by Robert M. Freund and Fernando Ordóñez. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission, provided that full credit including © notice is given to the source. This paper also can be downloaded without charge from the Social Science Research Network Electronic Paper Collection: http://ssrn.com/abstract=382363 On an Extension of Condition Number Theory to Non-Conic Convex Optimization Robert M. Freund∗ and Fernando Ordóñez† February 14, 2003 Abstract The purpose of this paper is to extend, as much as possible, the modern theory of condition numbers for conic convex optimization: z∗ := minx ct x s.t. Ax − b ∈ CY x ∈ CX , to the more general non-conic format: (GPd ) z∗ := minx ct x s.t. Ax − b ∈ CY x∈P , where P is any closed convex set, not necessarily a cone, which we call the groundset. Although any convex problem can be transformed to conic form, such transformations are neither unique nor natural given the natural description of many problems, thereby diminishing the relevance of data-based condition number theory. Herein we extend the modern theory of condition numbers to the problem format (GPd ). As a byproduct, we are able to state and prove natural extensions of many theorems from the conic-based theory of condition numbers to this broader problem format. Key words: Condition number, convex optimization, conic optimization, duality, sensitivity analysis, perturbation theory. ∗ MIT Sloan School of Management, 50 Memorial Drive, Cambridge, MA 02142, USA, email: rfreund@mit.edu † Industrial and Systems Engineering, University of Southern California, GER-247, Los Angeles, CA 90089-0193, USA, email: fordon@usc.edu 1 1 Introduction The modern theory of condition numbers for convex optimization problems was developed by Renegar in [16] and [17] for convex optimization problems in the following conic format: (CPd ) z∗ := minx ct x s.t. Ax − b ∈ CY x ∈ CX , (1) where CX ⊆ X and CY ⊆ Y are closed convex cones, A is a linear operator from the n-dimensional vector space X to the m-dimensional vector space Y, b ∈ Y, and c ∈ X ∗ (the space of linear functionals on X ). The data d for (CPd ) is defined as d := (A, b, c). The theory of condition numbers for (CPd ) focuses on three measures – ρP (d), ρD (d), and C(d), to bound various behavioral and computational quantities pertaining to (CPd ). The quantity ρP (d) is called the “distance to primal infeasibility” and is the smallest data perturbation ∆d for which (CPd+∆d ) is infeasible. The quantity ρD (d) is called the “distance to dual infeasibility” for the conic dual (CDd ) of (CPd ): (CDd ) z ∗ := maxy bt y ∗ s.t. c − At y ∈ CX y ∈ CY∗ , (2) and is defined similarly to ρP (d) but using the conic dual problem instead (which conveniently is of the same general conic format as the primal problem). The quantity C(d) is called the “condition measure” or the “condition number” of the problem instance d and is a (positively) scale-invariant reciprocal of the smallest data perturbation ∆d that will render the perturbed data instance either primal or dual infeasible: C(d) := kdk , min{ρP (d), ρD (d)} (3) for a suitably defined norm k · k on the space of data instances d. A problem is called “ill-posed” if min{ρP (d), ρD (d)} = 0, equivalently C(d) = ∞. These three condition measure quantities have been shown in theory to be connected to a wide variety of bounds on behavioral characteristics of (CPd ) and its dual, including bounds on sizes of feasible solutions, bounds on sizes of optimal solutions, bounds on optimal objective values, bounds on the sizes and aspect ratios of inscribed balls in the feasible region, bounds on the rate of deformation of the feasible region under perturbation, bounds on changes in optimal objective values under perturbation, and numerical bounds related to the linear algebra computations of certain algorithms, see [16], [5], [4], [6], [7], [8], [21], [19], [22], [20], [14], [15]. In the context of interior-point methods for linear and semidefinite optimization, these same three condition measures have also been shown to 2 be connected to various quantities of interest regarding the central trajectory, see [10] and [11]. The connection of these condition measures to the complexity of algorithms has been shown in [6], [7], [17], [2], and [3], and some of the references contained therein. The conic format (CPd ) covers a very general class of convex problems; indeed any convex optimization problem can be transformed to an equivalent instance of (CPd ). However, such transformations are not necessarily unique and are sometimes rather unnatural given the “natural” description and the natural data for the problem. The condition number theory developed in the aforementioned literature pertains only to convex optimization problems in conic form, and the relevance of this theory is diminished to the extent that many practical convex optimization problems are not conveyed in conic format. Furthermore, the transformation of a problem to conic form can result in dramatically different condition numbers depending on the choice of transformation, see the example in Section 2 of [13]. Motivated to overcome these shortcomings, herein we extend the condition number theory to non-conic convex optimization problems. We consider the more general format for convex optimization: (GPd ) z∗ (d) = min ct x s.t. Ax − b ∈ CY x∈P , (4) where P is allowed to be any closed convex set, possibly unbounded, and possibly without interior. For example, P could be the solution set of box constraints of the form l ≤ x ≤ u where some components of l and/or u might be unbounded, or P might be the solution of network flow constraints of the form N x = g, x ≥ 0. And of course, P might also be a closed convex cone. We call P the ground-set and we refer to (GPd ) as the “ground-set model” (GSM) format. We present the definition of the condition number for problem instances of the more general GSM format in Section 2, where we also demonstrate some basic properties. A number of results from condition number theory are extended to the GSM format in the subsequent sections of the paper. In Section 3 we prove that a problem instance with a finite condition number has primal and dual Slater points, which in turn implies that strong duality holds for the problem instance and its dual. In Section 4 we provide characterizations of the condition number as the solution to associated optimization problems. In Section 5 we show that if the condition number of a problem instance is finite, then there exist primal and dual interior solutions that have good geometric properties. In Section 6 we show that the rate of deformation of primal and dual feasible regions and optimal objective function values due to changes in the data are bounded by functions of the condition number. Section 7 contains concluding remarks. We now present the notation and general assumptions that we will use throughout the paper. 3 Notation and General Assumptions. We denote the variable space X by IRn and the constraint space Y by IRm . Therefore, P ⊆ IRn , CY ⊆ IRm , A is an m by n real matrix, b ∈ IRm , and c ∈ IRn . The spaces X ∗ and Y ∗ of linear functionals on IRn and IRm can be identified with IRn and IRm , respectively. For v, w ∈ IRn or IRm , we write v t w for the standard inner product. We denote by D the vector space of all data instances d = (A, b, c). A particular data instance is denoted equivalently by d or (A, b, c). We define the norm for a data instance d by kdk := max{kAk, kbk, kck∗ }, where the norms kxk and kyk on IRn and IRm are given, kAk denotes the usual operator norm, and k · k∗ denotes the dual norm associated with the norm k · k on IRn or IRm , respectively. Let B(v, r) denote the ball centered at v with radius r, using the norm for the space of variables v. For a convex cone S, let S ∗ denote the (positive) dual cone, namely S ∗ := {s | st x ≥ 0 for all x ∈ S}. Given a set Q ⊂ IRn , we denote the closure and relative interior of Q by cl Q and relint Q, respectively. We use the convention that if Q is the singleton Q = {q}, then relint Q = Q. We adopt the standard conventions 01 = ∞ 1 = 0. and ∞ We also make the following two general assumptions: Assumption 1 P 6= ∅ and CY 6= ∅. Assumption 2 Either CY 6= IRm or P is not bounded (or both). Clearly if either P = ∅ or CY = ∅ problem (GPd ) is infeasible regardless of A, b, and c. Therefore Assumption 1 avoids settings wherein all problem instances are trivially inherently infeasible. Assumption 2 is needed to avoid settings where (GPd ) is feasible for every d = (A, b, c) ∈ D. This will be explained further in Section 2. 2 2.1 Condition Numbers for (GPd) and its Dual Distance to Primal Infeasibility We denote the feasible region of (GPd ) by: Xd := {x ∈ IRn | Ax − b ∈ CY , x ∈ P } . (5) Let FP := {d ∈ D | Xd 6= ∅}, i.e., FP is the set of data instances for which (GPd ) has a feasible solution. Similar to the conic case, the primal distance to infeasibility, denoted by ρP (d), is defined as: n ρP (d) := inf {k∆dk | Xd+∆d = ∅} = inf k∆dk | d + ∆d ∈ FPC 4 o . (6) 2.2 The Dual Problem and Distance to Dual Infeasibility In the case when P is a cone, the conic dual problem (2) is of the same basic format as the primal problem. However, when P is not a cone, we must first develop a suitable dual problem, which we do in this subsection. Before doing so we introduce a dual pair of cones associated with the ground-set P . Define the closed convex cone C by homogenizing P to one higher dimension: C := cl {(x, t) ∈ IRn × IR | x ∈ tP, t > 0} , (7) and note that C = {(x, t) ∈ IRn × IR | x ∈ tP, t > 0} ∪ (R × {0}) where R is the recession cone of P , namely R := {v ∈ IRn | there exists x ∈ P for which x + θv ∈ P for all θ ≥ 0} . (8) It is straightforward to show that the (positive) dual cone C ∗ of C is C ∗ := {(s, u) ∈ IRn × IR | st x + u · t ≥ 0 for all (x, t) ∈ C} = {(s, u) ∈ IRn × IR | st x + u ≥ 0, for all x ∈ P } = {(s, u) ∈ IRn × IR | inf x∈P st x + u ≥ 0} . (9) The standard Lagrangian dual of (GPd ) can be constructed as: max inf {ct x + (b − Ax)t y} y∈CY∗ x∈P which we re-write as: max inf {bt y + (c − At y)t x} . (10) y∈CY∗ x∈P With the help of (9) we re-write (10) as: (GDd ) z ∗ (d) = maxy,u bt y − u s.t. (c − At y, u) ∈ C ∗ y ∈ CY∗ . (11) We consider the formulation (11) to be the dual problem of (4). The feasible region of (GDd ) is: Yd := {(y, u) ∈ IRm × IR | (c − At y, u) ∈ C ∗ , y ∈ CY∗ }. (12) Let FD := {d ∈ D | Yd 6= ∅}, i.e., FD is the set of data instances for which (GDd ) has a feasible solution. The dual distance to infeasibility, denoted by ρD (d), is defined as: n ρD (d) := inf {k∆dk | Yd+∆d = ∅} = inf k∆dk | d + ∆d ∈ FDC 5 o . (13) We also present an alternate form of (11), which does not use the auxiliary variable u, based on the function u(·) defined by u(s) := − inf st x . x∈P (14) It follows from Theorem 5.5 in [18] that u(·), the support function of the set P , is a convex function. The epigraph of u(·) is: epi u(·) := {(s, v) ∈ IRn × IR | v ≥ u(s)} , and the projection of the epigraph onto the space of the variables s is the effective domain of u(·): effdom u(·) := {s ∈ IRn | u(s) < ∞} . It then follows from (9) that C ∗ = epi u(·) , and so (GDd ) can alternatively be written as: z ∗ (d) = maxy bt y − u(c − At y) s.t. c − At y ∈ effdom u(·) y ∈ CY∗ . (15) Evaluating the inclusion (y, u) ∈ Yd is not necessarily an easy task, as it involves checking the inclusion (c − At y, u) ∈ C ∗ , and C ∗ is an implicitly defined cone. A very useful tool for evaluating the inclusion (y, u) ∈ Yd is given in the following proposition, where recall from (8) that R is the recession cone of P . Proposition 1 If y satisfies y ∈ CY∗ and c − At y ∈ relintR∗ , then u(c − At y) is finite, and for all u ≥ u(c − At y) it holds that (y, u) is feasible for (GDd ). Proof: Note from Proposition 11 of the Appendix that cl effdom u(·) = R∗ and from Proposition 12 of the Appendix that c − At y ∈ relintR∗ = relint cl effdom u(·) = relint effdom u(·) ⊆ effdom u(·). This shows that u(c − At y) is finite and (c − At y, u(c − At y)) ∈ C ∗ . Therefore (y, u) is feasible for (GDd ) for all u ≥ u(c − At y). 2.3 Condition Number A data instance d = (A, b, c) is consistent if both the primal and dual problems have feasible solutions. Let F denote the set of consistent data instances, namely F := 6 FP ∩ FD = {d ∈ D | Xd 6= ∅ and Yd 6= ∅}. For d ∈ F, the distance to infeasibility is defined as: ρ(d) := min {ρP (d), ρD (d)} (16) = inf {k∆dk | Xd+∆d = ∅ or Yd+∆d = ∅} , the interpretation being that ρ(d) is the size of the smallest perturbation of d which will render the perturbed problem instance either primal or dual infeasible. The condition number of the instance d is defined as C(d) :=    kdk ρ(d) > 0 ρ(d)   ∞ ρ(d) = 0 , which is a (positive) scale-invariant reciprocal of the distance to infeasibility. This definition of condition number for convex optimization problems was first introduced by Renegar for problems in conic form in [16] and [17]. 2.4 Basic Properties of ρP (d), ρD (d), and C(d), and Alternative Duality Results The need for Assumptions 1 and 2 is demonstrated by the following: Proposition 2 For any data instance d ∈ D, 1. ρP (d) = ∞ if and only if CY = IRm . 2. ρD (d) = ∞ if and only if P is bounded. The proof of this proposition relies on Lemmas 1 and 2, which are versions of “theorems of the alternative” for primal and dual feasibility of (GPd ) and (GDd ). These two lemmas are stated and proved at the end of this section. Proof of Proposition 2: Clearly CY = IRm implies that ρP (d) = ∞. Also, if P is bounded, then R = {0} and R∗ = IRn , whereby from Proposition 1 we have that (GDd ) is feasible for any d, and so ρD (d) = ∞. Therefore for both items it only remains to prove the converse implication. Recall that we denote d = (A, b, c). Assume that ρP (d) = ∞, and suppose that CY 6= IRm . Then CY∗ 6= {0}, and consider ∆d a point ỹ ∈ CY∗ , ỹ 6= 0. Define the perturbation = (∆A, ∆b, ∆c) = (−A, −b + ỹ, −c) ỹ t ỹ ¯ and d = d + ∆d. Then the point (y, u) = ỹ, 2 satisfies the alternative system (A2d¯) of Lemma 1 for the data d¯ = (0, ỹ, 0), whereby Xd¯ = ∅. Therefore kd¯− dk ≥ ρP (d) = ∞, a contradiction, and so CY = IRm . 7 Now assume that ρD (d) = ∞, and suppose that P is not bounded, and so R 6= {0}. Consider x̃ ∈ R, x̃ 6= 0, and define the perturbation ∆d = (−A, −b, −c − x̃). Then the point x̃ satisfies the alternative system (B2d¯) of Lemma 2 for the data d¯ = d + ∆d = (0, 0, −x̃), whereby Yd¯ = ∅. Therefore kd¯ − dk ≥ ρD (d) = ∞, a contradiction, and so P is bounded. Remark 1 If d ∈ F, then C(d) ≥ 1. Proof: Consider the data instance d0 = (0, 0, 0). Note that Xd0 = P 6= ∅ and Yd0 = CY∗ × IR+ 6= ∅, therefore d0 ∈ F. If CY 6= IRm , consider b ∈ IRm \ CY , b 6= 0, and for any ε > 0 define the instance dε = (0, −εb, 0). This instance is such that for any ε > 0, Xdε = ∅, which means that dε ∈ FPC and therefore ρP (d) ≤ inf ε>0 kd − dε k ≤ kdk. If CY = IRm , then Assumption 2 implies that P is unbounded. This means that there exists a ray r ∈ R, r 6= 0. For any ε > 0 the instance dε = (0, 0, −εr) is such that Ydε = ∅, which means that dε ∈ FDC and therefore ρD (d) ≤ inf ε>0 kd − dε k ≤ kdk . In each case we have ρ(d) = min{ρP (d), ρP (d)} ≤ kdk, which implies the result. The following two lemmas present weak and strong alternative results for (GPd ) and (GDd ), and are used in the proofs of Proposition 2 and elsewhere. Lemma 1 Consider the following systems with data d = (A, b, c): (Xd ) Ax − b ∈ CY , x∈P (A1d ) (−At y, u) ∈ C ∗ bt y ≥ u , y 6= 0 y ∈ CY∗ (A2d ) (−At y, u) ∈ C ∗ bt y > u y ∈ CY∗ If system (Xd ) is infeasible, then system (A1d ) is feasible. Conversely, if system (A2d ) is feasible, then system (Xd ) is infeasible. Proof: Assume that system (Xd ) is infeasible. This implies that b 6∈ S := {Ax − v | x ∈ P, v ∈ CY } , which is a nonempty convex set. Using Proposition 10 we can separate b from S and therefore there exists y 6= 0 such that y t (Ax − v) ≤ y t b for all x ∈ P, v ∈ CY . Set u := y t b, then the inequality implies that y ∈ CY∗ and that (−At y)t x + u ≥ 0 for any x ∈ P . Therefore (−At y, u) ∈ C ∗ and (y, u) satisfies system (A1d ). Conversely, if both (A2d ) and (Xd ) are feasible then 0 ≤ y t (Ax − b) = (At y)t x − bt y < − (−At y)t x + u ≤ 0 . 8 Lemma 2 Consider the following systems with data d = (A, b, c): ∗ t (Yd ) (c − A y, u) ∈ C , y ∈ CY∗ (B1d ) Ax ∈ CY ct x ≤ 0 , x 6= 0 x∈R (B2d ) Ax ∈ CY ct x < 0 x∈R If system (Yd ) is infeasible, then system (B1d ) is feasible. Conversely, if system (B2d ) is feasible, then system (Yd ) is infeasible. Proof: Assume that system (Yd ) is infeasible, this implies that n (0, 0, 0) 6∈ S := (s, v, q) | ∃ y, u s.t. (c − At y, u) + (s, v) ∈ C ∗ , y + q ∈ CY∗ o , which is a nonempty convex set. Using Proposition 10 we separate the point (0, 0, 0) from S and therefore there exists (x, δ, z) 6= 0 such that xt s + δv + z t q ≥ 0 for all (s, v, q) ∈ S. For any (y, u), (s̃, ṽ) ∈ C ∗ , and q̃ ∈ CY∗ , define s = −(c − At y) + s̃, v = −u + ṽ, and q = −y + q̃. By construction (s, v, q) ∈ S and therefore for any y, u, (s̃, ṽ) ∈ C ∗ , q̃ ∈ CY∗ we have −xt c + (Ax − z)t y + xt s̃ − δu + δṽ + z t q̃ ≥ 0 . The above inequality implies that δ = 0, Ax = z ∈ CY , x ∈ R, and ct x ≤ 0. In addition x 6= 0, because otherwise (x, δ, z) = (x, 0, Ax) = 0. Therefore (B1d ) is feasible. Conversely, if both (B2d ) and (Yd ) are feasible then 0 ≤ xt (c − At y) = ct x − y t Ax < −y t Ax ≤ 0 . 3 Slater Points, Distance to Infeasibility, and Strong Duality In this section we prove that the existence of a Slater point in either (GPd ) or (GDd ) is sufficient to guarantee that strong duality holds for these problems. We then show that a positive distance to infeasibility implies the existence of Slater points, and use these results to show that strong duality holds whenever ρP (d) > 0 or ρD (d) > 0. We first state a weak duality result. Proposition 3 Weak duality holds between (GPd ) and (GDd ), that is, z ∗ (d) ≤ z∗ (d) . Proof: Consider x and (y, u) feasible for (GPd ) and (GDd ), respectively. Then t 0 ≤ c − At y x + u = ct x − y t Ax + u ≤ ct x − bt y + u, 9 where the last inequality follows from y t (Ax − b) ≥ 0. Therefore z∗ (d) ≥ z ∗ (d). A classic constraint qualification in the history of constrained optimization is the existence of a Slater point in the feasible region, see for example Theorem 30.4 of [18] or Chapter 5 of [1]. We now define a Slater point for problems in the GSM format. Definition 1 A point x is a Slater point for problem (GPd ) if x ∈ relintP and Ax − b ∈ relintCY . A point (y, u) is a Slater point for problem (GDd ) if y ∈ relintCY∗ and (c − At y, u) ∈ relintC ∗ . We now present the statements of the main results of this section, deferring the proofs to the end of the section. The following two theorems show that the existence of a Slater point in the primal or dual is sufficient to guarantee strong duality as well as attainment in the dual or the primal problem, respectively. Theorem 1 If x0 is a Slater point for problem (GPd ), then z∗ (d) = z ∗ (d). If in addition z∗ (d) > −∞, then Yd 6= ∅ and problem (GDd ) attains its optimum. Theorem 2 If (y 0 , u0 ) is a Slater point for problem (GDd ), then z∗ (d) = z ∗ (d). If in addition z ∗ (d) < ∞, then Xd = 6 ∅ and problem (GPd ) attains its optimum. The next three results show that a positive distance to infeasibility is sufficient to guarantee the existence of Slater point for the primal and the dual problems, respectively, and hence is sufficient to ensure that strong duality holds. The fact that a positive distance to infeasibility implies the existence of an interior point in the feasible region is shown for the conic case in Theorems 15, 17, and 19 in [8] and Theorem 3.1 in [17]. Theorem 3 Suppose that ρP (d) > 0. Then there exists a Slater point for (GPd ). Theorem 4 Suppose that ρD (d) > 0. Then there exists a Slater point for (GDd ). Corollary 1 (Strong Duality) If ρP (d) > 0 or ρD (d) > 0, then z∗ (d) = z ∗ (d). If ρ(d) > 0, then both the primal and the dual attain their respective optimal values. Proof: The proof of this result is a straightforward consequence of Theorems 1, 2, 3, and 4. 10 Note that the contrapositive of Corollary 1 says that if d ∈ F and z∗ (d) > z ∗ (d), then ρP (d) = ρD (d) = 0 and so ρ(d) = 0. In other words, if a data instance d is primal and dual feasible but has a positive optimal duality gap, then d must necessarily be arbitrarily close to being both primal infeasible and dual infeasible. Proof of Theorem 1: For simplicity, let z∗ and z ∗ denote the primal and dual optimal objective values, respectively. The interesting case is when z∗ > −∞, otherwise weak duality implies that (GDd ) is infeasible and z∗ = z ∗ = −∞. If z∗ > −∞ the point (0, 0, 0) does not belong to the non-empty convex set n o S := (p, q, α) | ∃x s.t. x + p ∈ P, Ax − b + q ∈ CY , ct x − α < z∗ . We use Proposition 10 to properly separate (0, 0, 0) from S, which implies that there exists (γ, y, π) 6= 0 such that γ t p + y t q + πα ≥ 0 for all (p, q, α) ∈ S. Note that π ≥ 0 because α is not upper bounded in the definition of S. If π > 0, re-scale (γ, y, π) such that π = 1. For any x ∈ IRn , p̃ ∈ P , q̃ ∈ CY , and ε > 0 define p = −x + p̃, q = −Ax + b + q̃, and α = ct x − z∗ + ε. By construction the point (p, q, α) ∈ S and the proper separation implies that for all x, p̃ ∈ P , q̃ ∈ CY , and ε>0 0 ≤ γ t (−x + p̃) + y t (−Ax + b + q̃) + ct x − z∗ + ε = (−At y + c − γ)t x + γ t p̃ + y t q̃ + y t b − z∗ + ε . This expression implies that c − At y = γ, y ∈ CY∗ , and (c − At y, u) ∈ C ∗ for u := y t b−z∗ . Therefore (y, u) is feasible for (GDd ) and z ∗ ≥ bt y −u = bt y −y t b+z∗ = z∗ ≥ z ∗ , which implies that z ∗ = z∗ and the dual feasible point (y, u) attains the dual optimum. If π = 0, the same construction used above and proper separation gives the following inequality for all x, p̃ ∈ P , and q̃ ∈ CY 0 ≤ γ t (−x + p̃) + y t (−Ax + b + q̃) = (−At y − γ)t x + γ t p̃ + y t q̃ + y t b . This implies that −At y = γ and y ∈ CY∗ , which implies that −y t Ap̃ + y t q̃ + y t b ≥ 0 for any p̃ ∈ P , q̃ ∈ CY . Proper separation also guarantees that there exists (p̂, q̂, α̂) ∈ S such that γ t p̂ + y t q̂ + π α̂ = −y t Ap̂ + y t q̂ > 0. Let x0 be the Slater point of (GPd ) and x̂ such that x̂ + p̂ ∈ P , Ax̂ − b + q̂ ∈ CY , and c x̂ − α̂ < z∗ For all |ξ| sufficiently small, x0 + ξ(x̂ + p̂ − x0 ) ∈ P and Ax0 − b + ξ(Ax̂ − b + q̂ − (Ax0 − b)) ∈ CY . Therefore t 0 ≤ −y t A (x0 + ξ(x̂ + p̂ − x0 )) + y t (Ax0 − b + ξ(Ax̂ − b + q̂ − (Ax0 − b))) + y t b = ξ −y t Ax̂ − y t Ap̂ + y t Ax0 + y t Ax̂ − y t b + y t q̂ − y t Ax0 + y t b = ξ −y t Ap̂ + y t q̂ , 11 a contradiction, since ξ can be negative and −y t Ap̂ + y t q̂ > 0. Therefore π 6= 0, completing the proof. Proof of Theorem 2: For simplicity, let z∗ and z ∗ denote the primal and dual optimal objective values respectively. The interesting case is when z ∗ < ∞, otherwise weak duality implies that (GPd ) is infeasible and z∗ = z ∗ = ∞. If z ∗ < ∞ the point (0, 0, 0, 0) does not belong to the non-empty convex set n S := (s, v, q, α) | ∃y, u s.t. (c − At y, u) + (s, v) ∈ C ∗ , y + q ∈ CY∗ , bt y − u + α > z ∗ o . We use Proposition 10 to properly separate (0, 0, 0, 0) from S, which implies that there exists (x, β, γ, δ) 6= 0 such that xt s + βv + γ t q + δα ≥ 0 for all (s, v, q, α) ∈ S. Note that δ ≥ 0 because α is not upper bounded in the definition of S. If δ > 0, re-scale (x, β, γ, δ) such that δ = 1. For any y ∈ IRm , u ∈ IR, (s̃, ṽ) ∈ C ∗ , q̃ ∈ CY∗ , and ε > 0, define s = −c+At y+s̃, v = −u+ṽ, q = −y+q̃, and α = z ∗ −bt y+u+ε. By construction the point (s, v, q, α) ∈ S and proper separation implies that for all y, u, (s̃, ṽ) ∈ C ∗ , q̃ ∈ CY∗ , and ε > 0 0 ≤ xt (−c + At y + s̃) + β(−u + ṽ) + γ t (−y + q̃) + z ∗ − bt y + u + ε = (Ax − b − γ)t y + (x, β)t (s̃, ṽ) + (1 − β)u + γ t q̃ − ct x + z ∗ + ε . This implies that Ax − b = γ ∈ CY , β = 1, ct x ≤ z ∗ , and (x, 1) ∈ C, which means that x ∈ P . Therefore x is feasible for (GPd ) and z ∗ ≥ ct x ≥ z∗ ≥ z ∗ , which implies that z ∗ = z∗ and the primal feasible point x attains the optimum. If δ = 0, the same construction used above and proper separation gives the following inequality for all y, u, (s̃, ṽ) ∈ C ∗ , q̃ ∈ CY∗ 0 ≤ xt (−c + At y + s̃) + β(−u + ṽ) + γ t (−y + q̃) = (Ax − γ)t y + (x, β)t (s̃, ṽ) − βu + γ t q̃ − ct x . This implies that Ax = γ ∈ CY , β = 0, which means that xt s̃ + xt At q̃ − ct x ≥ 0 for any (s̃, ũ) ∈ C ∗ and q̃ ∈ CY∗ . The proper separation also guarantees that there exists (ŝ, v̂, q̂, α̂) ∈ S such that xt ŝ + βv̂ + γ t q̂ = xt ŝ + xt At q̂ > 0. Let (y 0 , u0 ) be the Slater point of (GDd ) and (ŷ, û) such that (c − At ŷ + ŝ, û + v̂) ∈ C ∗ , ŷ + q̂ ∈ CY∗ , and bt ŷ − û + α̂ > z ∗ . Then for all |ξ| sufficiently small, we have that y 0 + ξ (ŷ + q̂ − y 0 ) ∈ CY∗ and c − At y 0 + ξ c − At ŷ + ŝ − c + At y 0 , u0 + ξ (û + v̂ − u0 ) ∈ C ∗ . Therefore xt c − At y 0 + ξ c − At ŷ + ŝ − c + At y 0 12 + xt At (y 0 + ξ (ŷ + q̂ − y 0 )) − ct x ≥ 0 . Simplifying and canceling, we obtain 0 ≤ ξ −xt At ŷ + xt ŝ + xt At y 0 + xt At ŷ + xt At q̂ − xt At y 0 = ξ xt ŝ + xt At q̂ , a contradiction, since ξ can be negative and xt ŝ+xt At q̂ > 0. Therefore δ 6= 0, completing the proof. Proof of Theorem 3: Equation (6) and ρP (d) > 0 imply that Xd 6= ∅. Assume that Xd contains no Slater point, then relintCY ∩{Ax − b | x ∈ relintP } = ∅ and these nonempty convex sets can be separated using Proposition 10. Therefore there exists y 6= 0 such that for any s ∈ CY , x ∈ P we have y t s ≥ y t (Ax − b) . Let u = y t b; from the inequality above we have that y ∈ CY∗ and −y t Ax + u ≥ 0 for ε any x ∈ P , which implies that (−At y, u) ∈ C ∗ . Define bε = b + kyk ŷ, with ŷ given by ∗ t Proposition 9 such that kŷk = 1 and ŷ y = kyk∗ . Then the point (y, u) is feasible for Problem (A2dε ) of Lemma 1 with data dε = (A, bε , c) for any ε > 0. This implies that ε = 0, a contradiction. Xdε = ∅ and therefore ρP (d) ≤ inf ε>0 kd − dε k = inf ε>0 kyk ∗ Proof of Theorem 4: Equation (13) and ρD (d) > 0 imply that Yd 6= ∅. Assume that Yd contains no Slater point. Consider the nonempty convex set S defined by: S := n c − At y, u | y ∈ relintCY∗ , u ∈ IR o . No Slater point in the dual implies that relintC ∗ ∩ S = ∅. Therefore we can properly separate these two nonempty convex sets using Proposition 10, whereby there exists (x, t) 6= 0 such that for any (s, v) ∈ C ∗ , y ∈ CY∗ , u ∈ IR we have xt s + tv ≥ xt c − At y + tu . The above inequality implies that Ax ∈ CY , ct x ≤ 0, (x, t) ∈ C, and t = 0. This last fact implies that x 6= 0 and x ∈ R. Let x̂ be such that kx̂k∗ = 1 and x̂t x = kxk (see ε Proposition 9). For any ε > 0, define cε = c − kxk x̂. Then the point x is feasible for Problem (B2dε ) of Lemma 2 with data dε = (A, b, cε ). This implies then that Ydε = ∅ ε and consequently ρD (d) ≤ inf ε>0 kd − dε k = inf ε>0 kxk = 0, a contradiction. The contrapositives of Theorems 3 and 4 are not true. Consider for example the data " A= 0 0 0 0 # , b= −1 0 ! , and c = 1 0 ! , and the sets CY = IR+ × {0} and P = CX = IR+ × IR. Problem (GPd ) for this example has a Slater point at (1, 0), and ρP (d) = 0 (perturbing by ∆b = (0, ε) makes the problem infeasible for any ε.) Problem (GDd ) for the same example has a Slater point at (1, 0) and ρD (d) = 0 (perturbing by ∆c = (0, ε) makes the problem infeasible for any ε). 13 4 Characterization of ρP (d) and ρD (d) via Associated Optimization Problems Equation (16) shows that to characterize ρ(d) for consistent data instances d ∈ F, it is sufficient to express ρP (d) and ρD (d) in a convenient form. Below we show that these distances to infeasibility can be obtained as the solutions of certain associated optimization problems. These results can be viewed as an extension to problems not in conic form of Theorem 3.5 of [17], and Theorems 1 and 2 of [8]. Theorem 5 Suppose that Xd 6= ∅. Then ρP (d) = jP (d) = rP (d), where jP (d) = min max {kAt y + sk∗ , |bt y − u|} kyk∗ = 1 y ∈ CY∗ (s, u) ∈ C ∗ (17) and rP (d) = min max θ kvk ≤ 1 Ax − bt − vθ ∈ CY v ∈ IRm kxk + |t| ≤ 1 (x, t) ∈ C . (18) Theorem 6 Suppose that Yd 6= ∅. Then ρD (d) = jD (d) = rD (d), where jD (d) = min max {kAx − pk, |ct x + g|} kxk = 1 x∈R p ∈ CY g≥0 (19) and rD (d) = min max θ kvk∗ ≤ 1 −At y + cδ − θv ∈ R∗ v ∈ IRn kyk∗ + |δ| ≤ 1 y ∈ CY∗ δ≥0. (20) Proof of Theorem 5: Assume that jP (d) > ρP (d), then there exists a data instance d¯ = (Ā, b̄, c̄) that is primal infeasible and kA − Āk < jP (d), kb − b̄k < jP (d), and 14 kc − c̄k∗ < jP (d). From Lemma 1 there is a point (ȳ, ū) that satisfies the following: (−Āt ȳ, ū) ∈ C ∗ b̄t ȳ ≥ ū ȳ 6= 0 ȳ ∈ CY∗ . Scale ȳ such that kȳk∗ = 1, then (y, s, u) = (ȳ, −Āt ȳ, b̄t ȳ) is feasible for (17) and kAt y + sk∗ = kAt ȳ − Āt ȳk∗ ≤ kA − Ākkȳk∗ < jP (d) |bt y − u| = |bt ȳ − b̄t ȳ| ≤ kb − b̄kkȳk∗ < jP (d). In the first inequality above we used the fact that kAt k∗ = kAk. Therefore jP (d) ≤ max {kAt y + sk∗ , |bt y − u|} < jP (d), a contradiction. Let us now assume that jP (d) < γ < ρP (d) for some γ. This means that there exists (ȳ, s̄, ū) such that ȳ ∈ CY∗ , kȳk∗ = 1, (s̄, ū) ∈ C ∗ , and that kAt ȳ + s̄k∗ < γ, |bt ȳ − ū| < γ. From Proposition 9, consider ŷ such that kŷk = 1 and ŷ t ȳ = kȳk∗ = 1, and define, for ε > 0, Ā = A − ŷ ((At ȳ)t + s̄t ) b̄ε = b − ŷ (bt ȳ − ū − ε) . We have that ȳ ∈ CY∗ , −Āt ȳ = s̄, b̄tε ȳ = ū + ε > ū, and (−Āt ȳ, ū) ∈ C ∗ . This implies that for any ε > 0, the problem (A2d¯ε ) in Lemma 1 is feasible with data d¯ε = (Ā, b̄ε , c). Lemma 1 then implies that Xd¯ε = ∅ and therefore ρP (d) ≤ kd − d¯ε k. To finish the proof we compute the size of the perturbation: kA − Āk = kŷ (At ȳ)t + s̄t k ≤ kAt ȳ + s̄k∗ kŷk < γ kb − b̄ε k = |bt ȳ − ū − ε|kŷk ≤ |bt ȳ − ū| + ε < γ + ε, n o which implies, ρP (d) ≤ kd − d¯ε k = max kA − Āk, kb − b̄ε k < γ + ε < ρP (d), for ε small enough. This is a contradiction, whereby jP (d) = ρP (d). To prove the other characterization, we note that θ ≥ 0 in Problem (18) and invoke Lemma 6 to rewrite it as rP (d) = min min max {kAt y + sk∗ , | − bt y + u|} kvk ≤ 1 y t v ≥ 1 v ∈ IRm y ∈ CY∗ (s, u) ∈ C ∗ . 15 The above problem can be written as the following equivalent optimization problem: rP (d) = min max {kAt y + sk∗ , | − bt y + u|} kyk∗ ≥ 1 y ∈ CY∗ (s, u) ∈ C ∗ . The equivalence of these problems is verified by combining the minimization operations in the first problem and using the Cauchy-Schwartz inequality. The converse makes use of Proposition 9. To finish the proof, we note that if (y, s, u) is optimal for this last problem then it also satisfies kyk∗ = 1, whereby making it equivalent to (17). Therefore rP (d) = min max {kAt y + sk∗ , | − bt y + u|} = jP (d) . kyk∗ = 1 y ∈ CY∗ (s, u) ∈ C ∗ Proof of Theorem 6: Assume that jD (d) > ρD (d), then there exists a data instance d¯ = (Ā, b̄, c̄) that is dual infeasible and kA − Āk < jD (d), kb − b̄k < jD (d), and kc − c̄k∗ < jD (d). From Lemma 2 there exists x̄ ∈ R such that x̄ 6= 0, Āx̄ ∈ CY , and c̄t x̄ ≤ 0. We can scale x̄ such that kx̄k = 1. Then (x, p, g) = (x̄, Āx̄, −c̄t x̄) is feasible for (19), and kAx − pk = kAx − Āxk ≤ kA − Ākkxk < jD (d) |ct x + g| = |ct x − c̄t x| ≤ kc − c̄k∗ kxk < jD (d) . Therefore, jD (d) ≤ max {kAx − pk, |ct x + g|} < jD (d), which is a contradiction. Assume now that jD (d) < γ < ρD (d) for some γ. Then there exists (x̄, p̄, ḡ) such that x̄ ∈ R, kx̄k = 1, p̄ ∈ CY , and ḡ ≥ 0, and that kAx̄ − p̄k ≤ γ and |ct x̄ + ḡ| ≤ γ. From Proposition 9, consider x̂ such that kx̂k∗ = 1 and x̂t x̄ = kx̄k = 1, and define: Ā = A − (Ax̄ − p̄) x̂t and c̄ε = c − x̂(ct x̄ + ḡ + ε), for ε > 0. By construction Āx̄ = p̄ ∈ CY and c̄tε x̄ = −ḡ − ε < 0, for any ε > 0. Therefore Problem (B2d¯ε ) in Lemma 2 is feasible for data d¯ε = (Ā, b, c̄ε ), which implies that Yd¯ε = ∅. We can then bound ρD (d) as follows: n ρD (d) ≤ kd − d¯ε k = max k (Ax̄ − p̄) x̂t k, kx̂(ct x̄ + ḡ + ε)k∗ o ≤ max {γ, γ + ε} = γ + ε < ρD (d) for ε small enough, which is a contradiction. Therefore ρD (d) = jD (d). To prove the other characterization, we note that θ ≥ 0 in Problem (20) and invoke Lemma 6 to rewrite it as rD (d) = min min max {k − Ax + pk, |ct x + g|} kvk∗ ≤ 1 xt v ≥ 1 v ∈ IRn x ∈ R p ∈ CY g≥0. 16 The above problem can be written as the following equivalent optimization problem: rD (d) = min max {k − Ax + pk, |ct x + g|} kxk ≥ 1 x∈R p ∈ CY g≥0. The equivalence of these problems is verified by combining the minimization operations in the first problem and using the Cauchy-Schwartz inequality. The converse makes use of Proposition 9. To finish the proof, we note that if (x, p, g) is optimal for this last problem then it also satisfies kxk = 1, whereby making it equivalent to (19). Therefore rD (d) = min max {k − Ax + pk, |ct x + g|} = jD (d) . kxk = 1 x∈R p ∈ CY g≥0 5 Geometric Properties of the Primal and Dual Feasible Regions In Section 3 we showed that a positive primal and/or dual distance to infeasibility implies the existence of a primal and/or dual Slater point, respectively. We now show that a positive distance to infeasibility also implies that the corresponding feasible region has a reliable solution. We consider a solution in the relative interior of the feasible region to be a reliable solution if it has good geometric properties: it is not too far from a given reference point, its distance to the relative boundary of the feasible region is not too small, and the ratio of these two quantities is not too large, where these quantities are bounded by appropriate condition numbers. 5.1 Distance to Relative Boundary, Minimum Width of Cone An affine set T is the translation of a vector subspace L, i.e., T = a + L for some a. The minimal affine set that contains a given set S is known as the affine hull of S. We denote the affine hull of S by LS ; it is characterized as: LS = ( X ) αi xi | αi ∈ IR, xi ∈ S, i∈I X i∈I 17 αi = 1, I a finite set , b the vector subspace obtained when the affine see Section 1 in [18]. We denote by L S b = L − x. Note that hull LS is translated to contain the origin; i.e. for any x ∈ S, L S S if 0 ∈ S then LS is a subspace. Many results in this section involve the distance of a point x ∈ S to the relative boundary of the set S, denoted by dist(x, rel∂S), defined as follows: Definition 2 Given a non-empty set S and a point x ∈ S, the distance from x to the relative boundary of S is dist(x, rel∂S) := inf x̄ kx − x̄k s.t. x̄ ∈ LS \ S . (21) Note that if S is an affine set (and in particular if S is the singleton S = {s}), then dist(x, rel∂S) = ∞ for each x ∈ S. We use the following definition of the min-width of a convex cone: Definition 3 For a convex cone K, the min-width of K is defined by ) ( dist(y, rel∂K) | y ∈ K, y 6= 0 τK := sup kyk , for K 6= {0}, and τK := ∞ if K = {0}. The measure τK maximizes the ratio of the radius of a ball contained in the relative interior of K and the norm of its center, and so it intuitively corresponds to half of the vertex angle of the widest cylindrical cone contained in K. The quantity τK was called the “inner measure” of K for Euclidean norms in Goffin [9], and has been used more recently for general norms in analyzing condition measures for conic convex optimization, see [6]. Note that if K is not a subspace, then τK ∈ (0, 1], and τK is attained for some y 0 ∈ relintK satisfying ky 0 k = 1, as well as along the ray αy 0 for all α > 0; and τK takes on larger values to the extent that K has larger minimum width. If K is a subspace, then τK = ∞. 5.2 Geometric Properties of the Feasible Region of GPd In this subsection we present results concerning geometric properties of the feasible region Xd of (GPd ). We defer all proofs to the end of the subsection. The following proposition is an extension of Lemma 3.2 of [16] to the ground-set model format. 18 Proposition 4 Consider any x = x̂ + r feasible for (GPd ) such that x̂ ∈ P and r ∈ R. If ρD (d) > 0 then n o 1 max kAx̂ − bk, ct r . krk ≤ ρD (d) The following result is an extension of Assertion 1 of Theorem 1.1 of [16] to the ground-set model format of (GPd ): Proposition 5 Consider any x0 ∈ P . If ρP (d) > 0 then there exists x̄ ∈ Xd satisfying kx̄ − x0 k ≤ n o dist(Ax0 − b, CY ) max 1, kx0 k . ρP (d) The following is the main result of this subsection, and can be viewed as an extension of Theorems 15, 17, and 19 of [8] to the ground-set model format of (GPd ). In Theorem 7 we assume for expository convenience that P is not an affine set and CY is not a subspace. These assumptions are relaxed in Theorem 8. Theorem 7 Suppose that P is not an affine set, CY is not a subspace, and consider any x0 ∈ P . If ρP (d) > 0 then there exists x̄ ∈ Xd satisfying: 1. (a) kx̄ − x0 k ≤ kAx0 − bk + kAk max{1, kx0 k} ρP (d) (b) kx̄k ≤ kx0 k + kAx0 − bk + kAk ρP (d) 1 1 kAx0 − bk + kAk 2. (a) ≤ 1 + dist(x̄, rel∂P ) dist(x0 , rel∂P ) ρP (d) ! 1 1 kAx0 − bk + kAk (b) ≤ 1 + dist(x̄, rel∂Xd ) min {dist(x0 , rel∂P ), τCY } ρP (d) kx̄ − x0 k 1 ≤ 3. (a) 0 dist(x̄, rel∂P ) dist(x , rel∂P ) ! kAx0 − bk + kAk max{1, kx0 k} ρP (d) ! kx̄ − x0 k 1 (b) ≤ 0 dist(x̄, rel∂Xd ) min {dist(x , rel∂P ), τCY } kAx0 − bk + kAk max{1, kx0 k} ρP (d) kx̄k 1 kAx0 − bk + kAk 0 (c) ≤ kx k + dist(x̄, rel∂P ) dist(x0 , rel∂P ) ρP (d) 19 ! ! 1 kAx0 − bk + kAk kx̄k 0 ≤ kx k + (d) dist(x̄, rel∂Xd ) min {dist(x0 , rel∂P ), τCY } ρP (d) ! The statement of Theorem 8 below relaxes the assumptions on P and CY not being affine and/or linear spaces: Theorem 8 Consider any x0 ∈ P . If ρP (d) > 0 then there exists x̄ ∈ Xd with the following properties: • If P is not an affine set, x̄ satisfies all items of Theorem 7. • If P is an affine set and CY is not a subspace, x̄ satisfies all items of Theorem 7, where items 2.(a), 3.(a), and 3.(c) are vacuously valid as both sides of these inequalities are zero. • If P is an affine set and CY is a subspace, x̄ satisfies all items of Theorem 7, where items 2.(a), 2.(b), 3.(a), 3.(b), 3.(c), and 3.(d) are vacuously valid as both sides of these inequalities are zero. We conclude this subsection by presenting a result which captures the thrust of Theorems 7 and 8, emphasizing how the distance to infeasibility ρP (d) and the geometric properties of a given point x0 ∈ P bound various geometric properties of the feasible region Xd . For x0 ∈ P , define the following measure: gP,CY (x0 ) := max{kx0 k, 1} . min{1, dist(x0 , rel∂P ), τCY } Also define the following geometric measure of the feasible region Xd : ( gXd 1 kxk , := min max kxk, x∈Xd dist(x, rel∂Xd ) dist(x, rel∂Xd ) The following is an immediate consequence of Theorems 7 and 8. Corollary 2 Consider any x0 ∈ P . If ρP (d) > 0 then gXd kAx0 − bk + kAk ≤ gP,CY (x ) 1 + ρP (d) 0 20 ! . ) . We now proceed with proofs of these results. Proof of Proposition 4: If r = 0 the result is true. If r 6= 0, then Proposition 9 shows that there exists r̂ such that kr̂k∗ = 1 and r̂t r = krk. For any ε > 0 define the following perturbed problem instance: Ā = A + 1 (Ax̂ − b) r̂t , krk b̄ = b, c̄ = c + −(ct r)+ − ε r̂ . krk Note that, for the data d¯ = (Ā, b̄, c̄), the point r satisfies (B2d¯) in Lemma 2, and therefore ¯ which implies (GDd¯) is infeasible. We conclude that ρD (d) ≤ kd − dk, ρD (d) ≤ max {kAx̂ − bk, (ct r)+ + ε} krk ρD (d) ≤ max {kAx̂ − bk, ct r} . krk and so The following technical lemma, which concerns the optimization problem (P P ) below, is used in the subsequent proofs. Problem (P P ) is parametrized by given points x0 ∈ P and w0 ∈ CY , and is defined by (P P ) maxx,t,w,θ θ s.t. Ax − bt − w = θ (b − Ax0 + w0 ) kxk + |t| ≤ 1 (x, t) ∈ C w ∈ CY . (22) Lemma 3 Consider any x0 ∈ P and w0 ∈ CY such that Ax0 − w0 6= b. If ρP (d) > 0, then there exists a point (x, t, w, θ) feasible for problem (P P ) that satisfies θ≥ ρP (d) >0. kb − Ax0 + w0 k (23) Proof: Note that problem (P P ) is feasible for any x0 and w0 since (x, t, w, θ) = (0, 0, 0, 0) is always feasible, therefore it can either be unbounded or have a finite optimal objective value. If (P P ) is unbounded, we can find feasible points with an objective function large enough such that (23) holds. If (P P ) has a finite optimal value, say θ∗ , then it follows from elementary arguments that it attains its optimal value. Since ρP (d) > 0 implies Xd 6= ∅, Theorem 5 implies that the optimal solution (x∗ , t∗ , w∗ , θ∗ ) for (P P ) satisfies (23). Proof of Proposition 5: Assume Ax0 − b 6∈ CY , otherwise x̄ = x0 satisfies the proposition. We consider problem (P P ), defined by (22), with x0 and w0 ∈ CY such 21 that kAx0 − b − w0 k = dist(Ax0 − b, CY ). From Lemma 3 we have that there exists a point (x, t, w, θ) feasible for (P P ) that satisfies θ≥ ρP (d) ρP (d) = . kb − Ax0 + w0 k dist(Ax0 − b, CY ) Define x + θx0 w + θw0 and w̄ = . t+θ t+θ By construction we have x̄ ∈ P , Ax̄ − b = w̄ ∈ CY , therefore x̄ ∈ Xd , and x̄ = kx̄ − x0 k = n o kx − tx0 k (kxk + t) max{1, kx0 k} dist(Ax0 − b, CY ) ≤ ≤ max 1, kx0 k . t+θ θ ρP (d) Proof of Theorem 7: Note that ρP (d) > 0 implies Xd 6= ∅; note also that ρP (d) is finite, otherwise Proposition 2 shows that CY = IRm which is a subspace. Set w0 ∈ CY such 0 Y) . We also assume that Ax0 − b 6= w0 , otherwise that kw0 k = kAk and τCY = dist(wkw,rel∂C 0k we can show that x̄ = x0 satisfies the theorem. Let rw0 = dist(w0 , rel∂CY ) = kAkτCY and let also rx0 = dist(x0 , rel∂P ). We invoke Lemma 3 with x0 and w0 above to obtain a point (x, t, w, θ), feasible for (P P ) and that from inequality (23) satisfies 0< 1 kAx0 − bk + kAk ≤ . θ ρP (d) (24) Define the following: x̄ = x + θx0 , t+θ w̄ = w + θw0 , t+θ rx̄ = θrx0 , t+θ rw̄ = θτCY . t+θ By construction dist(x̄, rel∂P ) ≥ rx̄ , dist(w̄, rel∂CY ) ≥ rw̄ kAk, and Ax̄−b = w̄ ∈ CY . Therefore the point x̄ ∈ Xd . We now bound its distance to the relative boundary of the feasible region. b ∩ {y|Ay ∈ L } such that kvk ≤ 1, then Consider any v ∈ L P CY x̄ + αv ∈ P, for any |α| ≤ rx̄ , and A(x̄ + αv) − b = w̄ + α(Av) ∈ CY , for any |α| ≤ rw̄ . Therefore (x̄ + αv) ∈ Xd for any |α| ≤ min {rx̄ , rw̄ }, and the distance to the relative boundary of Xd is then dist(x̄, rel∂Xd ) ≥ |α|kvk ≥ |α|, for any |α| ≤ min {rx̄ , rw̄ }. θ min{rx0 ,τCY } Therefore dist(x̄, rel∂Xd ) ≥ min {rx̄ , rw̄ } ≥ . t+θ To finish the proof, we just have to bound the different expressions from the statement of the theorem; here we make use of inequality (24): 22 kx − tx0 k 1 kAx0 − bk + kAk ≤ max{1, kx0 k} ≤ max{1, kx0 k} . t+θ θ ρP (d) 1 1 kAx0 − bk + kAk (b) kx̄k ≤ kxk + kx0 k ≤ + kx0 k ≤ kx0 k + . θ θ ρP (d) 1. (a) kx̄ − x0 k = 1 1 t+θ 1 1 1 kAx0 − bk + kAk 2. (a) ≤ = ≤ 1+ ≤ 1+ dist(x̄, rel∂P ) rx̄ θrx0 r x0 θ r x0 ρP (d) 1 1 t+θ 1 1 (b) ≤ ≤ 1+ dist(x̄, rel∂Xd ) min{rx0 , τCY } θ min{rx0 , τCY } θ ! 0 1 kAx − bk + kAk 1+ . ≤ min{rx0 , τCY } ρP (d) 3. (a) ! . kx̄ − x0 k kx − tx0 k 1 1 ≤ ≤ max{1, kx0 k} dist(x̄, rel∂P ) θrx0 r x0 θ 1 kAx0 − bk + kAk ≤ max{1, kx0 k} . r x0 ρP (d) kx̄ − x0 k kx − tx0 k 1 1 ≤ ≤ max{1, kx0 k} dist(x̄, rel∂Xd ) θ min{rx0 , τCY } min{rx0 , τCY } θ 1 kAx0 − bk + kAk ≤ max{1, kx0 k} . min{rx0 , τCY } ρP (d) kx̄k kx + θx0 k 1 1 0 (c) ≤ ≤ kx k + dist(x̄, rel∂P ) θrx0 r x0 θ ! 0 1 kAx − bk + kAk 0 ≤ kx k + . r x0 ρP (d) (b) kx̄k kx + θx0 k 1 1 (d) ≤ ≤ kx0 k + dist(x̄, rel∂Xd ) θ min{rx0 , τCY } min{rx0 , τCY } θ ! 0 1 kAx − bk + kAk kx0 k + . ≤ min{rx0 , τCY } ρP (d) Finally, we note that Theorem 8 can be proved using almost identical arguments as in the proof of Theorem 7, but with a careful analysis to handle the special cases when P is an affine set or CY is a subspace, see [12] for exact details. 5.3 Solutions in the relative interior of Yd In this subsection we present results concerning geometric properties of the dual feasible region Yd of (GDd ). We defer all proofs to the end of the subsection. Before proceeding, 23 we first discuss norms that arise when studying the dual problem. Motivated quite naturally by (18), we define the norm k(x, t)k := kxk+|t| for points (x, t) ∈ C ⊂ IRn ×IR. This then leads to the following dual norm for points (s, u) ∈ C ∗ ⊂ IRn × IR: k(s, u)k∗ := max{ksk∗ , |u|} . (25) Consistent with the characterization of ρD (d) given by (20) in Theorem 6, we define the following dual norm for points (y, δ) ∈ IRm × IR: k(y, δ)k∗ := kyk∗ + |δ| . (26) It is clear that the above defines a norm on the vector space IRm × IR which contains Yd . The following proposition bounds the norm of the y component of the dual feasible solution (y, u) in terms of the objective function value bt y − u; it corresponds to Lemma 3.1 of [16] for the ground-set model format. Proposition 6 Consider any (y, u) feasible for (GDd ). If ρP (d) > 0 then kyk∗ ≤ max {kck∗ , −(bt y − u)} . ρP (d) The following result corresponds to Assertion 1 of Theorem 1.1 of [16] for the groundset model format dual problem (GDd ): Proposition 7 Consider any y 0 ∈ CY∗ . If ρD (d) > 0 then for any ε > 0, there exists (ȳ, ū) ∈ Yd satisfying kȳ − y 0 k ≤ n o dist(c − At y 0 , R∗ ) + ε max 1, ky 0 k . ρD (d) The following is the main result of this subsection, and can be viewed as an extension of Theorems 15, 17, and 19 of [8] to the dual problem (GDd ). In Theorem 9 we assume for expository convenience that CY is not a subspace and that R (the recession cone of P ) is not a subspace. These assumptions are relaxed in Theorem 10. Theorem 9 Suppose that R and CY are not subspaces and consider any y 0 ∈ CY∗ . If ρD (d) > 0 then for any ε > 0, there exists (ȳ, ū) ∈ Yd satisfying: 1. (a) kȳ − y 0 k∗ ≤ kc − At y 0 k∗ + kAk max{1, ky 0 k∗ } ρD (d) 24 (b) kȳk∗ ≤ ky 0 k∗ + kc − At y 0 k∗ + kAk ρD (d) 1 1 kc − At y 0 k∗ + kAk 2. (a) ≤ 1 + dist(ȳ, rel∂CY∗ ) dist(y 0 , rel∂CY∗ ) ρD (d) ! 1 (1 + ε) max{1, kAk} kc − At y 0 k∗ + kAk (b) ≤ 1 + dist((ȳ, ū), rel∂Yd ) min {dist(y 0 , rel∂CY∗ ), τR∗ } ρD (d) kȳ − y 0 k∗ 1 3. (a) ≤ ∗ 0 dist(ȳ, rel∂CY ) dist(y , rel∂CY∗ ) kc − At y 0 k∗ + kAk max{1, ky 0 k∗ } ρD (d) kȳ − y 0 k∗ ≤ (b) dist((ȳ, ū), rel∂Yd ) (1 + ε) max{1, kAk} min {dist(y 0 , rel∂CY∗ ), τR∗ } ! kc − At y 0 k∗ + kAk max{1, ky 0 k∗ } ρD (d) kȳk∗ 1 kc − At y 0 k∗ + kAk 0 (c) ≤ ky k + ∗ dist(ȳ, rel∂CY∗ ) dist(y 0 , rel∂CY∗ ) ρD (d) (d) ! ! ! kȳk∗ ≤ dist((ȳ, ū), rel∂Yd ) (1 + ε) max{1, kAk} kc − At y 0 k∗ + kAk 0 ky k + ∗ min {dist(y 0 , rel∂CY∗ ), τR∗ } ρD (d) ! The statement of Theorem 10 below relaxes the assumptions on R and CY not being linear subspaces: Theorem 10 Consider any y 0 ∈ CY∗ . If ρD (d) > 0 then for any ε > 0, there exists (ȳ, ū) ∈ Yd with the following properties: • If CY is not a subspace, (ȳ, ū) satisfies all items of Theorem 9. • If CY is a subspace and R is not a subspace, (ȳ, ū) satisfies all items of Theorem 9, where items 2.(a), 3.(a), and 3.(c) are vacuously valid as both sides of these inequalities are zero. • If CY and R are subspaces, (ȳ, ū) satisfies items 1.(a), 1.(b), 2.(a), 3.(a), and 3.(c) of Theorem 9, where items 2.(a), 3.(a), and 3.(c) are vacuously valid as both sides of these inequalities are zero. The point (ȳ, ū) also satisfies 1 ≤ε dist((ȳ, ū), rel∂Yd ) kȳ − y 0 k∗ 3’.(b) ≤ε dist((ȳ, ū), rel∂Yd ) 2’.(b) 25 3’.(d) kȳk∗ ≤ε dist((ȳ, ū), rel∂Yd ) We conclude this subsection by presenting a result which captures the thrust of Theorems 9 and 10, emphasizing how the distance to dual infeasibility ρD (d) and the geometric properties of a given point y 0 ∈ CY∗ bound various geometric properties of the dual feasible region Yd . For y 0 ∈ relintCY∗ , define: gCY∗ ,R∗ (y 0 ) := max{ky 0 k∗ , 1} . min{1, dist(y 0 , rel∂CY∗ ), τR∗ } We now define a geometric measure for the dual feasible region. We do not consider the whole set Yd ; instead we consider only the projection onto the variables y. Let ΠYd denote the projection of Yd onto the space of the y variables: ΠYd := {y ∈ IRm | there exists u ∈ IR for which (y, u) ∈ Yd } . (27) Note that the set ΠYd corresponds exactly to the feasible region in the alternate formulation of the dual problem (15). We define the following geometric measure of the set ΠYd : ( ) kyk∗ 1 gYd := inf max kyk∗ , , . (y,u)∈Yd dist(y, rel∂ΠYd ) dist(y, rel∂ΠYd ) Corollary 3 Consider any y 0 ∈ CY∗ . If ρD (d) > 0 then gYd kc − At y 0 k∗ + kAk ≤ max{1, kAk}gCY∗ ,R∗ (y ) 1 + ρD (d) 0 ! . Proof: We show in Lemma 4, item 4, that for any (ȳ, ū) ∈ Yd , dist(ȳ, rel∂ΠYd ) ≥ dist((ȳ, ū), rel∂Yd ). If either CY or R is not a subspace, use items 1.(b), 2.(b), and 3.(d) from Theorem 9 and apply the definition of gYd to obtain gYd kc − At y 0 k∗ + kAk ≤ (1 + ε) max{1, kAk}gCY∗ ,R∗ (y ) 1 + ρD (d) 0 ! . Since now the left side is independent of ε, take the limit as ε → 0. If both CY and R are subspaces we obtain the stronger bound gYd kc − At y 0 k∗ + kAk ≤ gCY∗ ,R∗ (y ) 1 + ρD (d) ! 0 by using item 1.(b) from Theorem 9, items 2’.(b) and 3’.(d) from Theorem 10, and the definition of gYd . 26 We now state Lemma 4, we start by defining the following set: n Yed := (y, u) ∈ IRm × IR | (c − At y, u) ∈ C ∗ o . (28) Note that the dual feasible region Yd is recovered from Yed as Yd = Yed ∩ (CY∗ × IR). The following lemma, whose proof is deferred to the Appendix, relates a variety of distances to relative boundaries of sets arising in the dual problem: Lemma 4 Given a dual feasible point (y, u) ∈ Yd , let s = c − At y ∈ effdom u(·). Then: 1. dist ((y, u), rel∂(CY∗ × IR)) = dist (y, rel∂CY∗ ) . 2. dist (y, u), rel∂ Yed ≥ 3. dist ((y, u), rel∂Yd ) ≥ 1 dist((s, u), rel∂C ∗ ) max{1,kAk} 1 max{1,kAk} . min {dist ((s, u), rel∂C ∗ ) , dist (y, rel∂CY∗ )} . 4. dist(y, rel∂ΠYd ) ≥ dist((y, u), rel∂Yd ) . We now proceed with the proofs of the results of this subsection. Proof of Proposition 6: If y = 0 the result is true. If y 6= 0, then Proposition 9 shows that there exists ŷ such that kŷk = 1 and ŷ t y = kyk∗ . For any ε > 0, define the following perturbed problem instance: 1 ((−bt y + u)+ + ε) ŷct , b̄ = b + ŷ, c̄ = c . kyk∗ kyk∗ We note that, for the data d¯ = (Ā, b̄, c̄), the point (y, u) satisfies (A2d¯) in Lemma 1, and ¯ which implies therefore (GPd¯) is infeasible. We conclude that ρP (d) ≤ kd − dk, Ā = A − ρP (d) ≤ and so ρP (d) ≤ max {kck∗ , (−bt y + u)+ + ε} kyk∗ max {kck∗ , −(bt y − u)} . kyk∗ The following technical lemma, which concerns the optimization problem (DP ) below, is used in the subsequent proofs. Problem (DP ) is parameterized by given points y 0 ∈ CY∗ and s0 ∈ R∗ , and is defined by (DP ) maxy,δ,s,θ θ s.t. −At y + δc − s = θ (At y 0 − c + s0 ) kyk∗ + |δ| ≤ 1 y ∈ CY∗ δ≥0 s ∈ R∗ . 27 (29) Lemma 5 Consider any y 0 ∈ CY∗ and s0 ∈ R∗ such that At y 0 + s0 6= c. If ρD (d) > 0, then there exists a point (y, δ, s, θ) feasible for problem (DP ) that satisfies θ≥ ρD (d) >0. kc − At y 0 − s0 k∗ (30) Proof: Note that problem (DP ) is feasible for any y 0 and s0 since (y, δ, s, θ) = (0, 0, 0, 0) is always feasible. Therefore it can either be unbounded or have a finite optimal objective value. If (DP ) is unbounded, we can find feasible points with an objective function large enough such that (30) holds. If (DP ) has a finite optimal value, say θ∗ , then it follows from elementary arguments that it attains this value. Since ρD (d) > 0 implies Yd 6= ∅, Theorem 6 implies that the optimal solution (y ∗ , δ ∗ , s∗ , θ∗ ) for (DP ) satisfies (30). Proof of Proposition 7: Assume c − At y 0 6∈ relintR∗ , otherwise from Proposition 1, the point (ȳ, ū) = (y 0 , u(c − At y 0 )) satisfies the assertion of the proposition. We consider problem (DP ), defined by (29), with y 0 and s0 ∈ relintR∗ such that kc − At y 0 − s0 k ≤ dist(c−At y 0 , R∗ )+ε. From Lemma 5 we have that there exists a point (y, δ, s, θ) feasible for (DP ) that satisfies θ≥ ρD (d) ρD (d) ≥ . t 0 0 kc − A y − s k dist(c − At y 0 , R∗ ) + ε Define s + θs0 y + θy 0 and s̄ = . δ+θ δ+θ By construction we have ȳ ∈ CY∗ , c − At ȳ = s̄ ∈ relintR∗ . Therefore from Proposition 1 (ȳ, u(c − At ȳ)) ∈ Yd , and letting ξ = max{1, ky 0 k∗ } we have ȳ = kȳ − y 0 k∗ = ky − δy 0 k∗ (kyk∗ + δ)ξ dist(c − At y 0 , R∗ ) + ε ≤ ≤ ξ. δ+θ θ ρD (d) Proof of Theorem 9: Note that ρD (d) > 0 implies Yd 6= ∅; note also that ρD (d) is finite, otherwise Proposition 2 shows that R = {0} which is a subspace. Set s0 ∈ R∗ 0 ∗) such that ks0 k∗ = kAk and τR∗ = dist(sks,rel∂R . We also assume for now that c − At y 0 6= 0k ∗ s0 . We show later in the proof how to handle the case when c − At y 0 = s0 . Denote ry0 = dist(y 0 , rel∂CY∗ ), and rs0 = dist(s0 , rel∂R∗ ) = τR∗ kAk > 0. With the points y 0 and s0 , use Lemma 5 to obtain a point (y, δ, s, θ) feasible for (DP ) such that from inequality (30) satisfies 0< 1 kc − At y 0 k∗ + kAk ≤ . θ ρD (d) 28 (31) Define the following: ȳ = y + θy 0 , δ+θ s̄ = s + θs0 , δ+θ rȳ = θry0 , δ+θ rs̄ = θrs0 . δ+θ By construction dist(ȳ, rel∂CY∗ ) ≥ rȳ , dist(s̄, rel∂R∗ ) ≥ rs̄ , and c − At ȳ = s̄. Therefore, from Proposition 1 the point (ȳ, u(s̄)) ∈ Yd . We now choose ū so that (ȳ, ū) ∈ Yd and bound its distance to the relative boundary. Since relint R∗ ⊆ effdom u(·), from rs̄ Proposition 11 and Proposition 12, we have that for any ε > 0, the ball B s̄, 1+ε ∩LR∗ ⊂ relint effdom u(·). Define the function µ̃(·, ·) by µ̃(s̄, κ) := 1 κ kAk + sups u(s) . ks − s̄k∗ ≤ κ s ∈ R∗ Note that µ̃(·, ·) is finite for every s̄ ∈ relint effdom u(·) and κ ∈ [0, dist(s̄, rel∂R∗ )), because it is defined as the supremum of the continuous function u(·) over a closed and bounded subset contained in the relative interior of its effective domain, see Theorem rs̄ rs̄ , and since ū ≥ kAk(1+ε) + u(s̄) ≥ u(s̄) the 10.1 of [18]. We define ū = µ̃ s̄, 1+ε point (ȳ, ū) ∈ Yd . Let us now bound dist((ȳ, ū), rel∂Yd ). Consider any vector v ∈ LCY∗ ∩ {y| − At y ∈ LR∗ } such that kvk∗ ≤ 1, then ȳ + αv ∈ CY∗ for any |α| ≤ rȳ , and c − At (ȳ + αv) = s̄ + α(−At v) ∈ B s̄, rs̄ rs̄ ∩ LR∗ for any |α| ≤ . 1+ε kAk(1 + ε) This last inclusion implies that (c − At (ȳ + αv), ū + β) = (s̄ + α(−At v), ū + β) ∈ C ∗ rs̄ for any |α|, |β| ≤ kAk(1+ε) . We have shown that dist(ȳ, rel∂CY∗ ) ≥ rȳ and dist((c − rs̄ At ȳ, ū), rel∂C ∗ ) ≥ kAk(1+ε) . Therefore item 3 of Lemma 4 implies ( ) 1 rs̄ dist((ȳ, ū), rel∂Yd ) ≥ min rȳ , max{1, kAk} kAk(1 + ε) ( ) 1 θ rs0 ≥ min ry0 , (1 + ε) max{1, kAk} δ + θ kAk θ min{ry0 , τR∗ } = . (1 + ε) max{1, kAk}(δ + θ) To finish the proof, we bound the different expressions in the statement of the theorem; let ξ = max{1, kAk} to simplify notation. Here we use inequality (31): 29 1. (a) kȳ − y 0 k∗ = ky − δy 0 k∗ max{1, ky 0 k∗ } kc − At y 0 k∗ + kAk ≤ ≤ max{1, ky 0 k∗ } . δ+θ θ ρD (d) 1 1 kc − At y 0 k∗ + kAk (b) kȳk∗ ≤ kyk∗ + ky 0 k∗ ≤ + ky 0 k∗ ≤ ky 0 k∗ + . θ θ ρD (d) 1 1 δ+θ 1 1 1 2. (a) ≤ = ≤ 1+ ≤ ∗ dist(ȳ, rel∂CY ) rȳ θry0 ry 0 θ ry 0 kc − At y 0 k∗ + kAk 1+ . ρD (d) ! 1 (1 + ε)ξ δ + θ (1 + ε)ξ 1 (b) ≤ ≤ 1+ ∗ ∗ dist((ȳ, ū), rel∂Yd ) min{ry0 , τR } θ min{ry0 , τR } θ ! t 0 (1 + ε)ξ kc − A y k∗ + kAk 1+ . ≤ min{ry0 , τR∗ } ρD (d) 3. (a) (b) kȳ − y 0 k∗ ky − δy 0 k∗ 1 ≤ ≤ max{1, ky 0 k∗ } ∗ dist(ȳ, rel∂CY ) θry0 ry 0 θ 1 kc − At y 0 k∗ + kAk ≤ max{1, ky 0 k∗ } . ry 0 ρD (d) kȳ − y 0 k∗ (1 + ε)ξky − δy 0 k∗ (1 + ε)ξ 1 ≤ ≤ max{1, ky 0 k∗ } dist((ȳ, ū), rel∂Yd ) θ min{ry0 , τR∗ } min{ry0 , τR∗ } θ (1 + ε)ξ kc − At y 0 k∗ + kAk ≤ max{1, ky 0 k∗ } . min{ry0 , τR∗ } ρD (d) ky + θy 0 k∗ 1 1 kȳk∗ (c) ≤ ≤ ky 0 k∗ + ∗ dist(ȳ, rel∂CY ) θry0 ry 0 θ ! t 0 1 kc − A y k∗ + kAk 0 ≤ ky k∗ + . ry 0 ρD (d) kȳk∗ ky + θy 0 k∗ (1 + ε)ξ (1 + ε)ξ 1 ≤ ≤ ky 0 k∗ + dist((ȳ, ū), rel∂Yd ) θ min{ry0 , τR∗ } min{ry0 , τR∗ } θ ! t 0 (1 + ε)ξ kc − A y k∗ + kAk ≤ ky 0 k∗ + . min{ry0 , τR∗ } ρD (d) (d) ∗ kAk For the case c − At y 0 = s0 , define ȳ = y 0 and ū = µ̃ s0 , τR1+ε . The proof then proceeds exactly as above, except that now we show that dist((c − At ȳ, ū), rel∂C ∗ ) ≥ τR∗ 1 , which implies that dist((ȳ, ū), rel∂Yd ) ≥ max{1,kAk}(1+ε) min{τR∗ , ry0 } from item 3 of 1+ε Lemma 4. This inequality is then used to prove each item in the theorem. Finally, we note that Theorem 10 can be proved using almost identical arguments as in the proof of Theorem 9, but with a careful analysis to handle the special cases when R or CY are subspaces, see [12] for the exact details. 30 6 Sensitivity under Perturbation In this section we present several results that bound the deformation of primal and dual feasible regions and objective function values under data perturbation. All proofs are deferred to the end of the section. The following two theorems bound the deformation of the primal and dual feasible regions under data perturbation. These results are essentially extensions of Assertion 2 of Theorem 1.1 of [16] to the primal and dual problems in the GSM format. Theorem 11 Suppose that ρP (d) > 0. Let ∆d = (∆A, ∆b, ∆c) be such that Xd+∆d 6= ∅ and consider any x0 ∈ Xd+∆d . Then there exists x̄ ∈ Xd satisfying kx̄ − x0 k ≤ (k∆bk + k∆Akkx0 k) max{1, kx0 k} . ρP (d) Theorem 12 Suppose that ρD (d) > 0. Let ∆d = (∆A, ∆b, ∆c) be such that Yd+∆d 6= ∅ and consider any (y 0 , u0 ) ∈ Yd+∆d . Then for any ε > 0, there exists (ȳ, ū) ∈ Yd satisfying kȳ − y 0 k∗ ≤ (k∆ck∗ + k∆Akky 0 k∗ + ε) max {1, ky 0 k∗ } . ρD (d) The next two results bound changes in optimal objective function values under data perturbation. Proposition 8 and Theorem 13 below respectively extend to the ground-set model format Lemma 3.9 and Assertion 5 of Theorem 1.1 in [16]. Proposition 8 Suppose that d ∈ F and ρ(d) > 0. Let ∆d = (0, ∆b, 0) be such that Xd+∆d 6= ∅. Then, z∗ (d + ∆d) − z∗ (d) ≥ −k∆bk max{kck∗ , −z∗ (d)} . ρP (d) Theorem 13 Suppose that d ∈ F and ρ(d) > 0. Let ∆d = (∆A, ∆b, ∆c) satisfy k∆dk < ρ(d). Then, if x∗ and x̂ are optimal solutions for (GPd ) and (GPd+∆d ) respectively, |z∗ (d + ∆d) − z∗ (d)| ≤ k∆bk max{kck∗ + k∆ck∗ , −z∗ (d)} + ρP (d) − k∆dk 31 ! max{kck∗ + k∆ck∗ , −z∗ (d)} max{kx∗ k, kx̂k} . + k∆ck∗ + k∆Ak ρP (d) − k∆dk Proof of Theorem 11: We consider problem (P P ), defined by (22), with x0 = x0 and w0 such that (A + ∆A)x0 − (b + ∆b) = w0 ∈ CY . From Lemma 3 we have that there exists a point (x, t, w, θ) feasible for (P P ) that satisfies θ≥ ρP (d) ρP (d) ρP (d) = ≥ . kb − Ax0 + w0 k k∆Ax0 − ∆bk k∆Akkx0 k + k∆bk We define x + θx0 w + θw0 , w̄ = . t+θ t+θ By construction we have that x̄ ∈ P , Ax̄ − b = w̄ ∈ CY , therefore x̄ ∈ Xd , and x̄ = kx − tx0 k (kxk + t) max{1, kx0 k} k∆Akkx0 k + k∆bk ≤ ≤ max{1, kx0 k} . t+θ θ ρP (d) kx̄−x0 k = Proof of Theorem 12: From Proposition 11 we have that for any ε > 0 there exists ξ 6= ∆At y 0 −∆c such that kξk∗ ≤ ε and c+∆c+ξ −(A+∆A)t y 0 ∈ relintR∗ . We consider problem (DP ) defined by (29), with y 0 = y 0 and s0 := c+∆c+ξ−(A+∆A)t y 0 ∈ relintR∗ . From Lemma 5 we have that there exists a point (y, δ, s, θ) feasible for (DP ) that satisfies θ≥ ρD (d) ρD (d) ρD (d) = ≥ . t 0 0 t 0 kc − A y − s k∗ k∆A y − ∆c − ξk∗ k∆ck∗ + k∆Akky 0 k∗ + ε We define y + θy 0 s + θs0 , s̄ = . δ+θ δ+θ By construction we have that ȳ ∈ CY∗ and c − At ȳ = s̄ ∈ relintR∗ ⊆ effdom u(·), from Proposition 11 and Proposition 12. Therefore from Proposition 1, (ȳ, u(c − At ȳ)) ∈ Yd and ȳ = kȳ−y 0 k∗ = ky − δy 0 k∗ (kyk∗ + δ) max{1, ky 0 k∗ } k∆ck∗ + k∆Akky 0 k∗ + ε ≤ ≤ max{1, ky 0 k} . δ+θ θ ρD (d) Proof of Proposition 8: The hypothesis that ρ(d) > 0 implies that the GSM format problem with data d has zero duality gap and (GPd ) and (GDd ) attain their optimal values, see Corollary 1. Also, since Yd+∆d = Yd 6= ∅ has a Slater point (since ρD (d) > 0), and Xd+∆d 6= ∅, then (GPd+∆d ) and (GDd+∆d ) have no duality gap and (GPd+∆d ) attains 32 its optimal value, see Theorem 2. Let (y, u) ∈ Yd be an optimal solution of (GDd ), due to the form of the perturbation, point (y, u) ∈ Yd+∆d , and therefore z ∗ (d + ∆d) ≥ (b + ∆b)t y − u = z ∗ (d) + ∆bt y ≥ z ∗ (d) − k∆bkkyk∗ . The result now follows using the bound on the norm of dual feasible solutions from Proposition 6 and the strong duality for data instances d and d + ∆d. Proof of Theorem 13: The hypothesis that ρ(d) > 0 and ρ(d + ∆d) > 0 imply that the GSM format problems with data d and d + ∆d both have zero duality gap and all problems attain their optimal values, see Corollary 1. Let x̂ ∈ Xd+∆d be an optimal solution for (GPd+∆d ). Define the perturbation ∆d˜ = (0, ∆b − ∆Ax̂, 0). Then by construction the point x̂ ∈ Xd+∆d˜. Therefore ˜ . z∗ (d + ∆d) = (c + ∆c)t x̂ ≥ −k∆ck∗ kx̂k + ct x̂ ≥ −k∆ck∗ kx̂k + z∗ (d + ∆d) Invoking Proposition 8, we bound the optimal objective function value for the problem ˜ instance d + ∆d: ∗ ˜ ≥ z∗ (d) − k∆b − ∆Ax̂k max{kck∗ , −z (d)} . z∗ (d + ∆d) + k∆ck∗ kx̂k ≥ z∗ (d + ∆d) ρP (d) Therefore max{kck∗ , −z ∗ (d)} z∗ (d + ∆d) − z∗ (d) ≥ −k∆ck∗ kx̂k − (k∆bk + kAkkx̂k) . ρP (d) Changing the roles of d and d + ∆d we can construct the following upper bound: z∗ (d + ∆d) − z∗ (d) ≤ k∆ck∗ kx∗ k + (k∆bk + kAkkx∗ k) max{kc + ∆ck∗ , −z ∗ (d + ∆d)} , ρP (d + ∆d) where x∗ ∈ Xd is an optimal solution for (GPs ). The value −z ∗ (d + ∆d) can be replaced by −z ∗ (d) on the right side of the previous bound. To see this consider two cases. If −z ∗ (d + ∆d) ≤ −z ∗ (d), then we can do the replacement since it yields a larger bound. If −z ∗ (d + ∆d) > −z ∗ (d), the inequality above has a negative left side and a positive right side after the replacement. Note also that because of the hypothesis k∆dk < ρ(d), the distance to infeasibility satisfies ρP (d + ∆d) ≥ ρP (d) − k∆dk > 0. We finish the proof combining the previous two bounds, incorporating the lower bound on ρP (d + ∆d), and using strong duality of the data instances d and d + ∆d. 7 Concluding Remarks We have shown herein that most of the essential results regarding condition numbers for conic convex optimization problems can be extended to the non-conic ground-set model 33 format (GPd ). We have attempted herein to highlight the most important and/or useful extensions; for other results see [12]. It is interesting to note the absence of results that directly bound z∗ (d) or the norms of optimal solutions kx∗ k, ky ∗ k of (GPd ) and (GDd ) as in Assertions 3 and 4 of Theorem 1.1 of [16]. Such bounds are very important in relating the condition number theory to the complexity of algorithms. However, we do not believe that such bounds can be demonstrated for (GPd ) without further assumptions. The reason for this is subtle yet simple. Observe from Theorem 6 that ρD (d) depends only on d = (A, b, c), CY , and the recession cone R of P . That is, P only affects ρD (d) through its recession cone, and so information about the “bounded” portion of P is irrelevant to the value of ρD (d). For this reason it is not possible to bound the norm of primal optimal solutions x directly, and hence one cannot bound z∗ (d) directly either. Under rather mild additional assumptions, it is possible to analyze the complexity of algorithms for solving (GPd ), see [12] as well as a forthcoming paper on this topic. Note that the characterization results for ρP (d) and ρD (d) presented herein in Theorems 5 and 6 pertain only to the case when d ∈ F. A characterization of ρ(d) for d ∈ /F is the subject of future research. 8 Appendix This appendix contains supporting mathematical results that are used in the proofs of the results of this paper. We point the reader to existing proofs for the most well known results. Proposition 9 (Proposition 2 of [8]) Let X be an n-dimensional normed vector space with dual space X ∗ . For every x ∈ X, there exists x̄ ∈ X ∗ with the property that kx̄k∗ = 1 and kxk = x̄t x. Proposition 10 (Theorems 11.1 and 11.3 of [18]) Given two non-empty convex sets S and T in IRn , then relint S ∩ relint T = ∅ if and only if S and T can be properly separated, i.e., there exists y 6= 0 such that inf y t x ≥ sup y t z x∈S z∈T t sup y x > inf y t z . x∈S z∈T 34 The following is a restatement of Corollary 14.2.1 of [18], which relates the effective domain of u(·) of (14) to the recession cone of P , where recall that R∗ denotes the dual of the recession cone R defined in (8). Proposition 11 (Corollary 14.2.1 of [18]) Let R denote the recession cone of the nonempty convex set P , and define u(·) by (14). Then cl effdom u(·) = R∗ . Proposition 12 (Theorem 6.3 of [18]) For any convex set Q ⊆ IRn , cl relint Q = cl Q, and relint cl Q = relint Q. The following lemma is central in relating the two alternative characterizations of the distance to infeasibility and is used in the proofs in Section 4. Lemma 6 Consider two closed convex cones C ⊆ IRn and CY ⊆ IRm , and data (M, v) ∈ IRm×n × IRm . Strong duality holds between (P ) : z∗ = min kM t y + qk∗ s.t. y t v ≥ 1 y ∈ CY∗ q ∈ C∗ and (D) : z ∗ = max θ s.t. M x − θv ∈ CY kxk ≤ 1 θ≥0 x∈C . Proof: The proof that weak duality holds between (P ) and (D) is straightforward, therefore z ∗ ≤ z∗ . Assume z ∗ < z∗ , and set ε > 0 such that 0 ≤ z ∗ < z∗ − ε. Consider the following nonempty convex set S: n S := (u, δ, α) | ∃y, q s.t. y + u ∈ CY∗ , q + δ ∈ C ∗ , y t v ≥ 1 − α, kM t y + qk∗ ≤ z∗ − ε o . Then (0, 0, 0) ∈ / S, and from Proposition 10 there exists (z, x, θ) 6= 0 such that z t u + xt δ + θα ≥ 0 for any (u, δ, α) ∈ S. For any y ∈ IRm , ũ ∈ CY∗ , δ̃ ∈ C ∗ , π ≥ 0, and q̃ such that kq̃k∗ ≤ z∗ − ε, define q = −M t y + q̃, u = −y + ũ, δ = −q + δ̃, and α = 1 − y t v + π. This construction implies that the point (u, δ, α) ∈ S, and that for all y, ũ ∈ CY∗ , δ̃ ∈ C ∗ , π ≥ 0, and kq̃k∗ ≤ z∗ − ε it holds that: 0 ≤ z t (−y + ũ) + xt (M t y − q̃ + δ̃) + θ(1 − y t v + π) = y t (M x − θv − z) + z t ũ + xt δ̃ − xt q̃ + θ + θπ . This implies that M x − θv = z ∈ CY , x ∈ C, θ ≥ 0, and θ ≥ xt q̃ for kq̃k∗ ≤ z∗ − ε. If x 6= 0, re-scale (z, x, θ) such that kxk = 1 and then (x, θ) is feasible for (D). Set q̃ = (z∗ − ε)q̂, where q̂ is given by Proposition 9 and is such that kq̂k∗ = 1 and q̂ t x = kxk = 1. It then follows that z ∗ ≥ θ ≥ xt q̃ = z∗ − ε > z ∗ , which is a contradiction. 35 If x = 0, the above expression implies −θv = z ∈ CY , and θ ≥ 0. If θ > 0 then −v ∈ CY , which means that the point (0, β) is feasible for (D) for any β ≥ 0, implying that z ∗ = ∞, a contradiction since z ∗ < z∗ . If θ = 0, then z = 0, which is a contradiction since (z, x, θ) 6= 0. The next two lemmas concern properties of the distance to the relative boundary of a convex set. Lemma 7 Given convex sets A and B, and a point x ∈ A ∩ B, then dist(x, rel∂(A ∩ B)) ≥ min {dist(x, rel∂A), dist(x, rel∂B)} . Proof: The proof of this lemma is based in showing that LA∩B \ (A ∩ B) ⊂ (LA \ A) ∪ (LB \ B). If this inclusion is true then dist(x, rel∂(A ∩ B)) = ≥ inf x̄∈LA∩B \(A∩B) kx − x̄k inf x̄∈(LA \A)∪(LB \B) kx − x̄k ( = min ) inf x̄∈LA \A kx − x̄k, inf x̄∈LB \B kx − x̄k = min {dist(x, rel∂A), dist(x, rel∂B)} , which proves the lemma. Therefore we now prove the inclusion. Consider some x̄ ∈ P LA∩B , this means that there exists αi ∈ IR, xi ∈ A∩B, i ∈ I a finite set, and i∈I αi = 1, P such that x̄ = i∈I αi xi . Since xi ∈ A and xi ∈ B, we have that x̄ ∈ LA and x̄ ∈ LB . Therefore LA∩B ⊂ LA ∩ LB . The desired inclusion is then obtained with a little algebra: LA∩B \ (A ∩ B) ⊂ LA ∩ LB ∩ (A ∩ B)C = L A ∩ L B ∩ AC ∪ B C = L A ∩ L B ∩ AC ∪ L A ∩ L B ∩ B C ⊂ L A ∩ AC ∪ L B ∩ B C = (LA \ A) ∪ (LB \ B) . Last of all, we have: Proof of Lemma 4: Equality (1.) is a consequence of the fact that (y, u) ∈ LCY∗ ×IR \ (CY∗ × IR) if and only if y ∈ LCY∗ \ CY∗ , and that k(y, u) − (ȳ, u)k∗ = ky − ȳk∗ + |u − u| = ky − ȳk∗ . To prove inequality (2.), we first show that if (ȳ, ū) ∈ LYed \ Yed , then (c − At ȳ, ū) ∈ LC ∗ \ C ∗ . First note that if (ȳ, ū) 6∈ Yed then, by the definition of Yed , (c − At ȳ, ū) 6∈ C ∗ ; 36 therefore we only need to show that if (ȳ, ū) ∈ LYed then (c − At ȳ, ū) ∈ LC ∗ . Let P (ȳ, ū) ∈ LYed . Then there exists αi ∈ IR, (yi , ui ) ∈ Yed , i ∈ I a finite set, and i∈I αi = 1, P such that (ȳ, ū) = i∈I αi (yi , ui ). Consider (c − At ȳ, ū) = (c − At X αi yi , i∈I X αi ui ) = i∈I X αi (c − At yi , ui ) . i∈I Since (yi , ui ) ∈ Yed then (c − At yi , ui ) ∈ C ∗ and therefore (c − At ȳ, ū) ∈ LC ∗ . The inclusion above means that dist((s, u), rel∂C ∗ ) = ≤ inf k(s, u) − (s̄, ū)k∗ inf k(s, u) − (c − At ȳ, ū)k∗ (s̄,ū)∈LC ∗ \C ∗ (ȳ,ū)∈L Yd e = Yd e Yd e max{kAt ȳ − At yk∗ , |u − ū|} ed \Y max{kAkkȳ − yk∗ , |u − ū|} inf (ȳ,ū)∈L Yd e ≤ ed \Y inf (ȳ,ū)∈L ≤ max{ks − (c − At ȳ)k∗ , |u − ū|} inf (ȳ,ū)∈L = ed \Y ed \Y max{kAk, 1} max{ky − ȳk∗ , |u − ū|} inf (ȳ,ū)∈L Yd e ed \Y ≤ max{kAk, 1} (ky − ȳk∗ + |u − ū|) inf (ȳ,ū)∈L Yd e = max{kAk, 1} ed \Y k(y, u) − (ȳ, ū)k∗ inf (ȳ,ū)∈L Yd e ed \Y = max{kAk, 1}dist((y, u), rel∂ Yed ) . Inequality (3.) follows from the observation that Yd = Yed ∩ (CY∗ × IR), Lemma 7, and the bounds obtained in (1.) and (2.). The proof of item (4.) uses the fact (soon to be proved) that if ȳ ∈ LΠYd \ ΠYd then for any ū, (ȳ, ū) ∈ LYd \ Yd . Then from the definition of the relative distance to the boundary we have dist ((y, u), rel∂Yd ) = ≤ = = inf (ȳ,ū)∈LYd \Yd k(y, u) − (ȳ, ū)k∗ inf k(y, u) − (ȳ, ū)k∗ inf ky − ȳk∗ + |u − ū| ȳ∈LΠYd \ΠYd ,ū ȳ∈LΠYd \ΠYd ,ū inf ȳ∈LΠYd \ΠYd ky − ȳk∗ = dist (y, rel∂ΠYd ) , 37 which proves inequality (4.). To finish the proof we now show that if ȳ ∈ LΠYd \ ΠYd then for any ū, (ȳ, ū) ∈ LYd \ Yd . The fact that ȳ 6∈ ΠYd implies that for any ū, (ȳ, ū) 6∈ Yd . Now since ȳ ∈ LΠYd , there exists αi ∈ IR, (yi , ui ) ∈ Yd , i ∈ I a finite set P P such that ȳ = i∈I αi yi and i∈I αi = 1. Since for any (y, u) ∈ Yd and β ≥ 0 the point (y, u + β) ∈ Yd , we can express the point (ȳ, ū) by the following sum of points in Yd ! (ȳ, ū) = X αi yi + y − y, i∈I X αi ui + u + β1 − u − β2 , i∈I where for any ū, β1 = (ū − i∈I αi ui )+ and β2 = (ū − any ū, (ȳ, ū) ∈ LYd , completing the proof. P P i∈I αi ui )− . This shows that for References [1] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear Programming, Theory and Algorithms. John Wiley & Sons, Inc, New York, second edition, 1993. [2] F. Cucker and J. Peña. A primal-dual algorithm for solving polyhedral conic systems with a finite-precision machine. Technical report, GSIA, Carnegie Mellon University, 2001. [3] M. Epelman and R. M. Freund. A new condition measure, preconditioners, and relations between different measures of conditioning for conic linear systems. SIAM Journal on Optimization, 12(3):627–655, 2002. [4] S. Filipowski. On the complexity of solving sparse symmetric linear programs specified with approximate data. Mathematics of Operations Research, 22(4):769–792, 1997. [5] S. Filipowski. On the complexity of solving feasible linear programs specified with approximate data. SIAM Journal on Optimization, 9(4):1010–1040, 1999. [6] R. M. Freund and J. R. Vera. Condition-based complexity of convex optimization in conic linear form via the ellipsoid algorithm. SIAM Journal on Optimization, 10(1):155–176, 1999. [7] R. M. Freund and J. R. Vera. On the complexity of computing estimates of condition measures of a conic linear system. Technical Report, Operations Research Center, MIT, August 1999. [8] R. M. Freund and J. R. Vera. Some characterizations and properties of the “distance to ill-posedness” and the condition measure of a conic linear system. Mathematical Programming, 86(2):225–260, 1999. 38 [9] J. L. Goffin. The relaxation method for solving systems of linear inequalities. Mathematics of Operations Research, 5(3):388–414, 1980. [10] M. A. Nunez and R. M. Freund. Condition measures and properties of the central trajectory of a linear program. Mathematical Programming, 83(1):1–28, 1998. [11] M. A. Nunez and R. M. Freund. Condition-measure bounds on the behavior of the central trajectory of a semi-definite program. SIAM Journal on Optimization, 11(3):818–836, 2001. [12] F. Ordóñez. On the Explanatory Value of Condition Numbers for Convex Optimization: Theoretical Issues and Computational Experience. PhD thesis, Massachusetts Institute of Technology, 2002. [13] F. Ordóñez and R. M. Freund. Computational experience and the explanatory value of condition measures for linear optimization. Working Paper OR361-02, MIT, Operations Research Center, 2002. [14] J. Peña. Computing the distance to infeasibility: theoretical and practical issues. Technical report, Center for Applied Mathematics, Cornell University, 1998. [15] J. Peña and J. Renegar. Computing approximate solutions for convex conic systems of constraints. Mathematical Programming, 87(3):351–383, 2000. [16] J. Renegar. Some perturbation theory for linear programming. Mathematical Programming, 65(1):73–91, 1994. [17] J. Renegar. Linear programming, complexity theory, and elementary functional analysis. Mathematical Programming, 70(3):279–351, 1995. [18] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, New Jersey, 1997. [19] J. R. Vera. Ill-posedness and the computation of solutions to linear programs with approximate data. Technical Report, Cornell University, May 1992. [20] J. R. Vera. Ill-Posedness in Mathematical Programming and Problem Solving with Approximate Data. PhD thesis, Cornell University, 1992. [21] J. R. Vera. Ill-posedness and the complexity of deciding existence of solutions to linear programs. SIAM Journal on Optimization, 6(3):549–569, 1996. [22] J. R. Vera. On the complexity of linear programming under finite precision arithmetic. Mathematical Programming, 80(1):91–123, 1998. 39

MIT Sloan School of Management

Related documents

Products

Support

MIT Sloan School of Management

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib