Department of Computer Science Tech. Rep. TR-2005-31, December 2005 University of British Columbia EXACT REGULARIZATION OF LINEAR PROGRAMS MICHAEL P. FRIEDLANDER∗ Abstract. We show that linear programs (LPs) admit regularizations that either contract the original (primal) solution set or leave it unchanged. Any regularization function that is convex and has compact level sets is allowed—differentiability is not required. This is an extension of the result first described by Mangasarian and Meyer (SIAM J. Control Optim., 17(6), pp. 745–752, 1979). We show that there always exist positive values of the regularization parameter such that a solution of the regularized problem simultaneously minimizes the original LP and minimizes the regularization function over the original solution set. We illustrate the main result using the nondifferentiable `1 regularization function on a set of degenerate LPs. Numerical results demonstrate how such an approach yields sparse solutions from the application of an interior-point method. Key words. linear programming, regularization, degeneracy, sparse solutions, interior-point methods 1. Introduction. Consider the feasible linear program (LP) (P) minimize n x∈R cTx subject to Ax = b, x ≥ 0, with bounded optimal value, where A ∈ Rm×n , b ∈ Rm , and c ∈ Rn . We show that (P) always admits perturbations of its objective function such that solutions of the perturbed problem, henceforth referred to as the regularized LP, continue to be solutions of (P). In fact, a solution of the regularized LP simultaneously solves (P) and minimizes the regularization function over the optimal solution set of (P). We prove this result for any regularization function that is convex and has compact level sets—differentiability is not required. This may be regarded as an extension of Mangasarian and Meyer’s result [MM79, Theorem 1], which additionally requires differentiability of the regularization function. Consider any regularization function φ : Rn → R that is compact and has compact level sets. The regularized LP is given by (Pδ ) minimize x cTx + δφ(x) subject to Ax = b, x ≥ 0, where δ is a constant nonnegative regularization parameter. The regularization function φ may be nonlinear and/or nondifferentiable, so that (Pδ ) is not necessarily an LP (although it is convex). In §3 we prove that solutions of (Pδ ) are also solutions of (P) for all values of δ below some positive threshold value. Moreover, this set of solutions minimizes φ over the optimal solution set of (P). Interestingly, this positive threshold ∗ Department of Computer Science, University of British Columbia, Vancouver V6T 1Z4, BC, Canada (mpf@cs.ubc.ca). Research supported by the National Science and Engineering Council of Canada. December 26, 2005 1 2 M. P. FRIEDLANDER value always exists. Our proof is constructive, and we show how this threshold value may be computed. We call such regularizations exact, and choose this term to draw an analogy with exact penalty functions that are commonly used in algorithms for nonlinear optimization. An exact penalty formulation of a problem can recover a solution of the original problem for all values of the penalty parameter beyond a threshold value. (See, for example, [HM79, Ber82, Fle85], and [CGT00] for a more recent discussion.) As we have described, an analogous property holds for LPs: a solution of the regularized problem is a solution of the original problem for all values of the regularization parameter less than some threshold value. Regularization is a technique used commonly for solving ill-posed problems whose solutions may not be unique or may be acutely sensitive to the problem data. Regularization amounts to changing the problem statement in order to compute (usually approximate) solutions that are “well-behaved.” For example, approximate solutions with moderate norms may be preferred over very-large-norm solutions; or solutions to nearby and stable problems may be preferred. (See [Han98] for a comprehensive study of regularization for linear systems, including least-squares problems.) As with other problem classes, exact regularization of LPs may be useful for various reasons. If an LP does not have a unique optimal solution, exact regularization may be used to select solutions with desirable properties. In particular, Tikhonov [TA77] regularization can be used to select a minimum two-norm solution. Specialized algorithms for computing minimum two-norm solutions of LPs have been proposed by [Man84, Luc87a, Luc87b, ZL02, KQQ03, Man04], among others. Saunders and Tomlin [ST96] and Altman and Gondzio [AG99] consider Tikhonov regularization as a tool for influencing the conditioning of the underlying linear systems that arise in the implementation of large-scale interior-point methods. If we were to assume differentiability of φ, then our main result may be obtained by using the KKT conditions of (P) and (Pδ ). This is the approach used by Mangasarian and Meyer [MM79]. We can discard the differentiability assumption, however, by instead appealing to Lagrange duality theory. The advantage of this approach is evident for the need in many applications for nondifferentiable regularization functions. A vital example is `1 regularization, which has lately received significant attention in applications closely related to linear programming. Recent work related to signal processing has focused on using LPs to obtain sparse solutions (i.e., solutions with few nonzero elements) of underdetermined systems of linear equations Ax = b (with the possible additional condition x ≥ 0); for examples, see [CDS01, CRT04, CRT05, DT05]. In machine learning and statistics, `1 regularization of linear least-squares problems (sometimes called lasso regression) plays a prominent role as an alternative to Tikhonov regularization [Tib96, EHJT04]. To illustrate our main result, we consider the nondifferentiable `1 regularization function as a technique for selecting sparse solutions of LPs that do not have unique primal solutions (i.e., LPs that are dual degenerate). In §4 we give numerical results of `1 regularization on a set of standard LPs from the Netlib test set and on a set of randomly generated LPs with prescribed degeneracy. The numerical results highlight the effectiveness of this approach for encouraging sparse solutions obtained via an interior-point method. With the main result described in this paper, we can select a regularization parameter δ so that the solution of the regularized LP gives a minimum `1 solution over the original LP optimal solution set. 2. Preliminaries. The following assumptions hold implicitly throughout: 3 EXACT REGULARIZATION OF LPs Assumption 2.1 (Feasibility and finiteness). The feasible set is nonempty—i.e., there exists a point x̄ such that Ax̄ = b and x̄ ≥ 0. Moreover, the optimal value of (P) is bounded—i.e., p∗ > −∞, where p∗ is the optimal value of (P). Assumption 2.2 (Convexity and compactness). The regularization function φ is convex, and the level sets {x | φ(x) ≤ β} are compact for each β. The first assumption guarantees that (P) and its dual have optimal solutions. The first and second assumptions together ensure that (Pδ ) attains its optimal value and that strong duality holds. (See, for example, Propositions 5.2.1 and 5.2.2 of [Ber99] for these well-known results on linearly constrained convex programs and LPs, respectively.) Our main result hinges on relating the optimal solutions of (P) and (Pδ ) via the (convex) optimization problem (Pφ ) minimize x φ(x) subject to Ax = b, cTx ≤ p∗ , x ≥ 0. By construction, any solution of (Pφ ) is also a solution of (P). (The converse, however, does not generally hold.) We show that for all values of δ (including 0) below a certain positive threshold value, every solution of (Pδ ) is also a solution of (Pφ ), and hence also a solution of (P). The dual solutions of one problem are linear combinations of the dual solutions of the other problem; the linear combination depends on δ and on the Lagrange multiplier corresponding to the constraint cTx ≤ p∗ . Note that Lagrange multipliers for (Pφ ) always exist because its constraints are linear. (Again, see [Ber99, Proposition 5.2.1].) We make the following definitions: The Lagrangian functions of (Pδ ) and (Pφ ) are respectively given by Lδ (x, y, z) = cTx + δφ(x) − y T(Ax − b) − z Tx, Lφ (x, y, z, µ) = φ(x) − y T(Ax − b) + µ(cTx − p∗ ) − z Tx, where y ∈ Rm , z ∈ Rn , and µ ∈ R are independent variables. The associated Lagrange dual functions are given by gδ (y, z) = infn Lδ (x, y, z), x∈R gφ (y, z, µ) = infn Lφ (x, y, z, µ). x∈R Denote by S and Sδ the set of (primal) solutions of (P) and (Pδ ), respectively. Assumption 2.1 ensures that S and Sδ are nonempty. Denote by p∗δ the optimal value of (Pδ ), and as defined above, p∗ as the optimal value of (P). 3. Main result. The following theorem is a precise statement of our main result. Theorem 3.1. There exists a constant δ̄ > 0 such that Sδ ⊆ S for all δ ∈ [0, δ̄]. Moreover, each element in Sδ minimizes φ over S. Proof. We first establish a few key properties of the dual optimal values of (P) and (Pφ ). Assumption 2.1 implies that there exist optimal dual variables y ∗ and z ∗ ≥ 0 of (P) that satisfy (among other conditions) ATy ∗ + z ∗ = c and bTy ∗ = p∗ . (3.1) 4 M. P. FRIEDLANDER Assumptions 2.1 and 2.2 imply that there exist optimal dual variables yφ∗ , zφ∗ ≥ 0, and µ∗φ ≥ 0, of (Pφ ) such that gφ (yφ∗ , zφ∗ , µ∗φ ) = bTyφ∗ − µ∗φ p∗ + infn φ(x) − (ATyφ∗ + zφ∗ − µ∗φ c)Tx x∈R (3.2) is maximal. Hence, the infimum on the right-hand side of (3.2) attains its minimum at some point x∗φ . (Recall that φ has compact level sets over Rn .) By strong duality, x∗φ is optimal for (Pφ ), and φ(x∗φ ) = gφ (yφ∗ , zφ∗ ). (3.3) We next consider the regularized problem (Pδ ). Its feasible set coincides with (P), so the assumptions ensure that there exist optimal dual variables yδ∗ and zδ∗ ≥ 0. We show that these optimal dual variables are in fact linear combinations of the optimal dual variables of (P) and (Pφ ). The linear combination appears in two forms, depending on whether µ∗φ is zero or positive. We consider these two cases in turn. Case 1: µ∗φ = 0. We claim that the optimal dual variables of (Pδ ) are given by yδ∗ := y ∗ + δyφ∗ , zδ∗ := z ∗ + δzφ∗ . (3.4) Clearly zδ∗ ≥ 0, as required. Evaluate gδ (y, z) with the pair (yδ∗ , zδ∗ ), and use (3.4) to obtain gδ (yδ∗ , zδ∗ ) ≡ infn Lδ (x, yδ∗ , zδ∗ ) = x∈R bTyδ∗ T ∗ + infn δφ(x) − (ATyδ∗ + zδ∗ − c)Tx x∈R δbTyφ∗ + infn δφ(x) − {ATy ∗ + z ∗ − c + δ(ATyφ∗ + zφ∗ )}Tx x∈R = p∗ + δ bTyφ∗ + infn φ(x) − (ATyφ∗ + zφ∗ )Tx , =b y + (3.5) x∈R in which we use (3.1) to arrive at the last expression. Moreover, (3.2) and (3.3) (with µ∗φ = 0) imply that the infimum on the right-hand-side of (3.5) achieves its minimum at the point x∗φ . Substitute (3.2) and (3.3) into (3.5) to obtain gδ (yδ∗ , zδ∗ ) = p∗ + δφ(x∗φ ). (3.6) Note that x∗φ is feasible for (Pφ ) and for (Pδ ), so that p∗ ≥ cTx∗φ and cTx∗φ + δφ(x∗φ ) ≥ p∗δ . (3.7) It then follows from (3.6) and (3.7) that gδ (yδ∗ , zδ∗ ) ≥ p∗δ . By weak duality, however, gδ (yδ∗ , zδ∗ ) ≤ p∗δ , so that in fact gδ (yδ∗ , zδ∗ ) = p∗δ . This proves that strong duality holds, and that the dual optimum is attained, at the pair (yδ∗ , zδ∗ ) defined by (3.4). Therefore, (3.5) and (3.6) together imply that inf x∈Rn Lδ (x, yδ∗ , zδ∗ ) attains its minimum at x∗φ , so that in fact x∗φ is a solution of (Pδ ). Moreover, x∗φ is feasible for (P), so (3.7) implies that it is also a solution of (P). We therefore conclude that, when µ∗φ = 0, then x∗φ (and in fact, every solution of (Pφ )) is an optimal solution of (Pδ ) and also of (P), for all values of δ ≥ 0. In this case, δ̄ = +∞. 5 EXACT REGULARIZATION OF LPs Case 2: µ∗φ > 0. We claim that the optimal dual variables of (Pδ ) are given by yδ∗ := (1 − λ)y ∗ + λ ∗ y , µ∗φ φ zδ∗ := (1 − λ)z ∗ + λ ∗ z , µ∗φ φ (3.8) for any λ ∈ [0, 1]. Clearly zδ∗ ≥ 0, as required. Evaluate gδ (y, z) with this pair, and consider values of δ ≡ λ/µ∗φ , to get gδ (yδ∗ , zδ∗ ) ≡ infn Lδ (x, yδ∗ , zδ∗ ) x∈R = bTyδ∗ + infn δφ(x) − (ATyδ∗ + zδ∗ − c)Tx x∈R λ T ∗ b yφ µ∗φ T λ T ∗ ∗ T ∗ ∗ + infn δφ(x) − (1 − λ)(A y + z ) − c + ∗ (A yφ + zφ ) x (3.9) x∈R µφ T λ T ∗ λ λ ∗ T ∗ ∗ = (1 − λ)p + ∗ b yφ + infn ∗ φ(x) − (A yφ + zφ ) − λc x x∈R µφ µφ µ∗φ λ = p∗ + ∗ bTyφ∗ − µ∗φ p∗ + infn φ(x) − (ATyφ∗ + zφ∗ − µ∗φ c)Tx , x∈R µφ = (1 − λ)bTy ∗ + where we use (3.1) to arrive at the second-to-last inequality. As we noted for (3.5), the last infimum of the right-hand side of (3.9) achieves its minimum at the point x∗φ (but this time with µ∗φ > 0). Substitute (3.2) and (3.3) into (3.9) to obtain gδ (yδ∗ , zδ∗ ) = p∗ + λ φ(x∗φ ). µ∗φ (3.10) With the same arguments as used for Case 1, we can conclude that, when µ∗φ > 0, then x∗φ (and in fact, every solution of (Pφ )) is an optimal solution of (Pδ ) and also of (P), for all values of δ ∈ [0, δ̄] with δ̄ := 1 > 0. µ∗φ (3.11) The last statement of the theorem follows immediately from the fact that x∗φ solves both (P) and (Pφ ). Note that the range of values admitted by (3.11) is consistent with the extreme case µ∗φ & 0. In that situation, all nonnegative values of the penalty parameter are allowed by Theorem 3.1, as is predicted when µ∗φ = 0 (Case 1 of the theorem). 4. Sparse solutions. In this section we illustrate a practical application of Theorem 3.1. The aim is to obtain sparse solutions of linear programs that do not have unique optimal solutions. For the following discussion we let φ(x) = kxk1 , which clearly satisfies both standing assumptions. Regularization based on the `1 norm has been used in many applications with the goal of obtaining (or approximating) the sparsest solution of underdetermined systems of linear equations and least-squares problems. Some recent examples include [CDS01, DE03, DT05, DET05]. 6 M. P. FRIEDLANDER For underdetermined systems of equations that arise in fields such as signal processing, [CDS01], [CRT04], and [DT05] advocate solving the problem minimize kxk1 x subject to Ax = b (and possibly x ≥ 0) (4.1) in order to obtain a sparse solution. (Well-known techniques exist for recasting (4.1) as an LP.) The sparsest solution is in fact given by minimizing the so-called 0-norm kxk0 , which counts the number of nonzero elements in x. The combinatorial nature of such a problem, however, makes it computationally intractable for all but the most trivial cases. Interestingly, there exist conditions under which a solution of (4.1) is the sparsest feasible solution; see [CRT04] and [DT05]. Following this approach, we use Theorem 3.1 as a guide for obtaining least `1 norm solutions of a generic LP minimize cTx subject to Ax = b, x l ≤ x ≤ u, (4.2) by solving its regularized version: minimize cTx + δkxk1 x subject to Ax = b, l ≤ x ≤ u. (4.3) (The vectors l and u are lower and upper bounds on x.) In many of the numerical tests given below, the exact `1 -regularized solution of (4.2) (given by (4.3) for smallenough values of δ) is considerably sparser than the solution obtained by solving (4.2) directly. In each case, we solve the regularized and unregularized problems with the same interior-point solver. We emphasize that, with the appropriate choice of the regularization parameter, the solution of the regularized LP is also a solution of the original LP. We consider two sets of test problems in our numerical experiments. The problems of the first set are constructed from random data using a degenerate LP generator described by [Gon03]. Those of the second are derived from the infeasible LPs in the Netlib collection (http://www.netlib.org/lp/infeas/). Both sets of test problems are further described in §§4.1–4.2. For each test problem we follow the same procedure. We first reformulate the problem as an LP; this problem corresponds to (P). (There may, of course, be upper and lower bounds on x.) We solve the unregularized LP in order to obtain x∗ (an unregularized solution), and thus the optimal value p∗ := cTx∗ . Next, in order to obtain the threshold value δ̄ of the regularization parameter, we solve the LP that corresponds to (Pφ ). (In the nontrivial case, this value is given by (3.11).) Finally, an exact regularized solution x∗δ is obtained by solving (Pδ ) with δ := δ̄/2. We use the interior-point algorithm implemented in CPLEX 9.1 to solve each subproblem. The default CPLEX options are used, except for crossover = 0 and comptol = 1e-10. The first option forces CPLEX to use its barrier algorithm and to not “cross over” to an optimal basic solution. Under certain conditions, we expect the interior method to converge to the analytic center of the optimal face (see [Ye97, Theorems 2.16 and 2.17]). The second option tightens CPLEX’s convergence tolerance from its default of 1e-8 to its smallest allowable setting. We do not advocate such a tight tolerance in practice, but this change aids in computing the sparsity of a computed solution, which we determine as √ (4.4) kxk0 = card{xi | |xi | > }. 7 EXACT REGULARIZATION OF LPs Table 4.1 Random LPs generated with various levels of increasing dimension of the optimal primal face. LP random-0 random-20 random-40 random-60 random-80 random-100 cTx∗ cTx∗δ kx∗ k1 kx∗δ k1 kx∗ k0 kx∗δ k0 2.5e−13 5.6e−13 3.8e−12 3.9e−14 9.1e−12 1.8e−16 1.0e−13 6.6e−13 3.7e−12 9.2e−11 8.4e−13 3.2e−12 9.1e+01 2.9e+02 4.9e+02 6.7e+02 8.9e+02 1.0e+03 9.1e+01 2.0e+02 2.9e+02 3.6e+02 4.6e+02 5.4e+02 100 278 459 637 816 997 100 100 100 101 100 102 δ̄ 1.5e−04 2.2e−02 2.9e−02 3.3e−02 2.1e−01 1.1e−01 √ Here, ≈ 2.2 · 10−16 is the relative machine precision. The value is slightly larger than the specified convergence tolerance. The AMPL model and data files used to generate all of the numerical results presented in §§4.1–4.2 can be obtained at http://www.cs.ubc.ca/~mpf/exactreg/. 4.1. Random LPs. Six dual-degenerate LPs were constructed using Gonzaga’s Matlab generator [Gon03]. This Matlab program accepts as inputs the problem size and the dimensions of the optimal primal and dual faces, Dp and Dd , respectively. Gonzaga shows that these quantities must satisfy 0 ≤ Dp ≤ n − m − 1 and 0 ≤ Dd ≤ m − 1. (4.5) The six LPs are constructed with parameters n = 1000, m = 100, Dd = 0, and various levels of Dp set as 0%, 20%, 40%, 60%, 80%, and 100% of the maximum of 899 (given by (4.5)). The problems are respectively labeled random-0, random-20, random-40, and so on. Table 4.1 summarizes the results. We confirm that in each case the optimal values of the unregularized and regularized problems are nearly identical (at least to within the specified tolerance). Except for the “control” problem random-0, the exact regularized solution x∗δ has a strictly smaller `1 norm, and is considerably sparser, than the unregularized solution x∗ . 4.2. Infeasible LPs. The second set of problems is derived from a subset of the infeasible Netlib LPs. For each infeasible LP, we discard the original objective, and instead form the problem minimize kAx − bk1 x subject to l ≤ x ≤ u, (Pi) and its regularized counterpart minimize kAx − bk1 + δkxk1 x subject to l ≤ x ≤ u. (Piδ ) The unregularized problem (Pi) mimics the plausible situation where we wish to fit a set of infeasible equations in the `1 -norm sense. But because the `1 norm is not strictly convex, a solution of (Pi) may not be unique. The regularized problem (Piδ ) might be used in practice to further restrict that solution set. The following infeasible Netlib LPs were eliminated because CPLEX returned some error during the solution of (Pi) or (Piδ ): lpi-bgindy, lpi-cplex2, lpi-gran, lpi-klein1, lpi-klein2, lpi-klein3, lpi-qual, lpi-refinery, lpi-vol1. Table 4.2 summarizes the results. The exact regularized solution x∗δ has a smaller or equal `1 norm than the unregularized solution x∗ in all cases. In 60% of the 8 M. P. FRIEDLANDER Table 4.2 Least `1 feasible solutions of the infeasible Netlib LPs. LP lpi-bgdbg1 lpi-bgetam lpi-bgprtr lpi-box1 lpi-ceria3d lpi-chemcom lpi-cplex1 lpi-ex72a lpi-ex73a lpi-forest6 lpi-galenet lpi-gosh lpi-greenbea lpi-itest2 lpi-itest6 lpi-mondou2 lpi-pang lpi-pilot4i lpi-reactor lpi-woodinfe cTx∗ cTx∗δ kx∗ k1 kx∗δ k1 3.6e+02 5.4e+01 1.9e+01 1.0e+00 2.5e−01 9.8e+03 3.2e+06 1.0e+00 1.0e+00 8.0e+02 2.8e+01 4.0e−02 5.2e+02 4.5e+00 2.0e+05 1.7e+04 2.4e−01 3.3e+01 2.0e+00 1.5e+01 3.6e+02 5.4e+01 1.9e+01 1.0e+00 2.5e−01 9.8e+03 3.2e+06 1.0e+00 1.0e+00 8.0e+02 2.8e+01 4.0e−02 5.2e+02 4.5e+00 2.0e+05 1.7e+04 2.4e−01 3.3e+01 2.0e+00 1.5e+01 1.6e+04 6.0e+03 4.7e+03 5.2e+02 8.8e+02 1.5e+05 2.4e+09 4.8e+02 4.6e+02 4.0e+05 1.0e+02 1.5e+04 1.4e+06 2.3e+01 4.8e+05 3.2e+06 1.4e+06 6.9e+05 1.5e+06 3.6e+03 1.3e+04 5.3e+03 3.0e+03 2.6e+02 8.8e+02 3.8e+04 1.5e+09 3.0e+02 3.0e+02 4.0e+05 9.2e+01 7.1e+03 5.6e+05 2.3e+01 4.6e+05 2.7e+06 8.2e+04 5.1e+04 1.1e+06 2.0e+03 kx∗ k0 kx∗δ k0 518 633 25 261 1780 711 3811 215 211 54 10 9580 3658 7 12 297 536 773 569 60 437 441 20 261 1767 591 3489 215 211 54 11 1075 1609 7 14 244 336 627 357 87 δ̄ 3.3e−03 3.4e−04 3.7e−01 9.9e−01 6.7e−04 3.1e−01 1.0e−02 1.6e−01 1.6e−01 1.2e−03 6.3e−01 3.9e−05 1.1e−04 6.5e−01 4.8e−01 9.5e−02 1.4e−06 3.6e−06 4.1e−05 5.0e−01 cases, the regularized solution is sparser than the unregularized solution. In 25% of the cases, the solutions have the same amount of sparsity. In 15% of the cases (lpigalenet, lpi-itest6, and lpi-woodinfe), the regularized solutions are actually less sparse, even though the `1 norm is lower. 5. Discussion. We emphasize that our main result is constructive, and offers a procedure for determining the threshold value of the regularization parameter. However, we do not know how the threshold value might be computed directly from (P) without also solving (Pφ ). It seems that admissible values of δ can in general only be determined by first solving (P) to obtain p∗ , and subsequently solving (Pφ ) to obtain µ∗φ , and thus δ̄ (c.f. (3.11)). Suppose that a correct value of δ has been guessed, and an exact regularized solution x obtained. The dual variables (y, z) obtained from that regularized problem are necessarily perturbed (c.f. expressions (3.4) and (3.8)). Therefore, it is not possible to test the computed triple (x, y, z) against the optimality conditions of the original LP in order to verify that x is indeed an exact solution. In practice, if it were prohibitively expensive to solve (P) and (Pφ ) additionally, we might adopt an approach suggested by [Luc87b] and [Man04] for Tikhonov regularization. Lucidi and Mangasarian suggest successively solving the regularized LP with decreasing values δ1 > δ2 > · · · ; if successive regularized solutions do not change, then it is likely that a correct regularization parameter has been obtained. We note that in many cases, the threshold values δ̄ shown in Tables 4.1 and 4.2 are comfortably large, and a value such as δ = 10−4 would cover 85% of the these cases. Acknowledgments. I wish to thank my colleague Kevin Leyton-Brown for giving me access to his CPLEX installation for the numerical experiments in §4. Also, my EXACT REGULARIZATION OF LPs 9 sincere thanks to Adam Oberman for many fruitful discussions and comments on earlier versions of this manuscript. REFERENCES [AG99] [Ber82] [Ber99] [CDS01] [CGT00] [CRT04] [CRT05] [DE03] [DET05] [DT05] [EHJT04] [Fle85] [Gon03] [Han98] [HM79] [KQQ03] [Luc87a] [Luc87b] [Man84] [Man04] [MM79] [ST96] [TA77] [Tib96] A. Altman and J. Gondzio, Regularized symmetric indefinite systems in interior point methods for linear and quadratic optimization, Optimization Methods and Software 11 (1999), 275–302. D. P. Bertsekas, Constrained optimization and Lagrange multiplier methods, Academic Press, New York, 1982. , Nonlinear programming, second ed., Athena Scientific, Belmont, MA, 1999. S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM Rev. 43 (2001), no. 1, 129–159. A. R. Conn, N. I. M. Gould, and Ph. L. Toint, Trust-region methods, MPS-SIAM Series on Optimization, Society of Industrial and Applied Mathematics, Philadelphia, 2000. E. J. Candés, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, Tech. report, Applied and Computational Analysis, California Institute of Technology, 2004. , Stable signal recovery from incomplete and inaccurate measurements, To appear in Comm. Pure Appl. Math., 2005. D. L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via `1 minimization, Proc. Nat. Acad. Sci. USA 100 (2003), no. 5, 2197–2202. D. L. Donoho, M. Elad, and V. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise, To appear in IEEE Trans. Info. Theory, 2005. D. L. Donoho and J. Tanner, Sparse nonnegative solution of underdetermined linear equations by linear programming, Proc. Nat. Acad. Sci. USA 102 (2005), no. 27, 9446–9451. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression, The Annals of Statistics 32 (2004), no. 2, 407–499. R. Fletcher, An `1 penalty method for nonlinear constraints, Numerical Optimization 1984 (Philadelphia) (P. T. Boggs, R. H. Byrd, and R. B. Schnabel, eds.), Society of Industrial and Applied Mathematics, 1985, pp. 26–40. C. C. Gonzaga, Generation of degenerate linear programming problems, Tech. report, Dept. of Mathematics, Federal University of Santa Catarina, Brazil, April 2003. P. C. Hansen, Rank-deficient and discrete ill-posed problems, Society of Industrial and Applied Mathematics, Philadelphia, 1998. S.-P. Han and O. L. Mangasarian, Exact penalty functions in nonlinear programming, Math. Prog. 17 (1979), 251–269. C. Kanzow, H. Qi, and L. Qi, On the minimum norm solution of linear programs, J. Optim. Theory Appl. 116 (2003), 333–345. S. Lucidi, A finite algorithm for the least two-norm solution of a linear program, Optimization 18 (1987), no. 6, 809–823. , A new result in the theory and computation of the least-norm solution of a linear program, J. Optim. Theory Appl. 55 (1987), no. 1, 103–117. O. L. Mangasarian, Normal solutions of linear programs, Math. Prog. Stud. 22 (1984), 206–216. , A Newton method for linear programming, J. Optim. Theory Appl. 121 (2004), 1–18. O. L. Mangasarian and R. R. Meyer, Nonlinear perturbation of linear programs, SIAM J. Control Optim. 17 (1979), no. 6, 745–752. M. A. Saunders and J. A. Tomlin, Solving regularized linear programs using barrier methods and KKT systems, Tech. Report SOL Report 96-4, Stanford University, Department of EES & OR, Stanford, CA 94305, December 1996, Also IRM Research Report RJ 10064. A. N. Tikhonov and V. Y. Arsenin, Solutions of ill-posed problems, V. H. Winston and Sons, Washington, D.C., 1977, Translated from Russian. R. Tibshirani, Regression shrinkage and selection via the Lasso, J. Royal. Statist. Soc B. 58 (1996), no. 1, 267–288. 10 [Ye97] [ZL02] M. P. FRIEDLANDER Y. Ye, Interior-point algorithms: Theory and analysis, Wiley, Chichester, UK, 1997. Y.-B. Zhao and D. Li, Locating the least 2-norm solution of linear programs via a pathfollowing method, SIAM J. Optim. 12 (2002), no. 4, 893–912.