EXACT REGULARIZATION OF LINEAR PROGRAMS

advertisement
Department of Computer Science Tech. Rep. TR-2005-31, December 2005
University of British Columbia
EXACT REGULARIZATION OF LINEAR PROGRAMS
MICHAEL P. FRIEDLANDER∗
Abstract. We show that linear programs (LPs) admit regularizations that either
contract the original (primal) solution set or leave it unchanged. Any regularization
function that is convex and has compact level sets is allowed—differentiability is
not required. This is an extension of the result first described by Mangasarian and
Meyer (SIAM J. Control Optim., 17(6), pp. 745–752, 1979). We show that there
always exist positive values of the regularization parameter such that a solution of
the regularized problem simultaneously minimizes the original LP and minimizes the
regularization function over the original solution set. We illustrate the main result
using the nondifferentiable `1 regularization function on a set of degenerate LPs.
Numerical results demonstrate how such an approach yields sparse solutions from the
application of an interior-point method.
Key words. linear programming, regularization, degeneracy, sparse solutions,
interior-point methods
1. Introduction. Consider the feasible linear program (LP)
(P)
minimize
n
x∈R
cTx
subject to Ax = b,
x ≥ 0,
with bounded optimal value, where A ∈ Rm×n , b ∈ Rm , and c ∈ Rn . We show
that (P) always admits perturbations of its objective function such that solutions
of the perturbed problem, henceforth referred to as the regularized LP, continue to
be solutions of (P). In fact, a solution of the regularized LP simultaneously solves
(P) and minimizes the regularization function over the optimal solution set of (P).
We prove this result for any regularization function that is convex and has compact
level sets—differentiability is not required. This may be regarded as an extension
of Mangasarian and Meyer’s result [MM79, Theorem 1], which additionally requires
differentiability of the regularization function.
Consider any regularization function φ : Rn → R that is compact and has compact
level sets. The regularized LP is given by
(Pδ )
minimize
x
cTx + δφ(x)
subject to Ax = b,
x ≥ 0,
where δ is a constant nonnegative regularization parameter. The regularization function φ may be nonlinear and/or nondifferentiable, so that (Pδ ) is not necessarily an LP
(although it is convex). In §3 we prove that solutions of (Pδ ) are also solutions of (P)
for all values of δ below some positive threshold value. Moreover, this set of solutions
minimizes φ over the optimal solution set of (P). Interestingly, this positive threshold
∗ Department of Computer Science, University of British Columbia, Vancouver V6T 1Z4, BC,
Canada ([email protected]). Research supported by the National Science and Engineering Council of
Canada.
December 26, 2005
1
2
M. P. FRIEDLANDER
value always exists. Our proof is constructive, and we show how this threshold value
may be computed.
We call such regularizations exact, and choose this term to draw an analogy
with exact penalty functions that are commonly used in algorithms for nonlinear
optimization. An exact penalty formulation of a problem can recover a solution
of the original problem for all values of the penalty parameter beyond a threshold
value. (See, for example, [HM79, Ber82, Fle85], and [CGT00] for a more recent
discussion.) As we have described, an analogous property holds for LPs: a solution
of the regularized problem is a solution of the original problem for all values of the
regularization parameter less than some threshold value.
Regularization is a technique used commonly for solving ill-posed problems whose
solutions may not be unique or may be acutely sensitive to the problem data. Regularization amounts to changing the problem statement in order to compute (usually
approximate) solutions that are “well-behaved.” For example, approximate solutions
with moderate norms may be preferred over very-large-norm solutions; or solutions
to nearby and stable problems may be preferred. (See [Han98] for a comprehensive
study of regularization for linear systems, including least-squares problems.)
As with other problem classes, exact regularization of LPs may be useful for
various reasons. If an LP does not have a unique optimal solution, exact regularization
may be used to select solutions with desirable properties. In particular, Tikhonov
[TA77] regularization can be used to select a minimum two-norm solution. Specialized
algorithms for computing minimum two-norm solutions of LPs have been proposed
by [Man84, Luc87a, Luc87b, ZL02, KQQ03, Man04], among others. Saunders and
Tomlin [ST96] and Altman and Gondzio [AG99] consider Tikhonov regularization as
a tool for influencing the conditioning of the underlying linear systems that arise in
the implementation of large-scale interior-point methods.
If we were to assume differentiability of φ, then our main result may be obtained
by using the KKT conditions of (P) and (Pδ ). This is the approach used by Mangasarian and Meyer [MM79]. We can discard the differentiability assumption, however, by
instead appealing to Lagrange duality theory. The advantage of this approach is evident for the need in many applications for nondifferentiable regularization functions.
A vital example is `1 regularization, which has lately received significant attention
in applications closely related to linear programming. Recent work related to signal
processing has focused on using LPs to obtain sparse solutions (i.e., solutions with
few nonzero elements) of underdetermined systems of linear equations Ax = b (with
the possible additional condition x ≥ 0); for examples, see [CDS01, CRT04, CRT05,
DT05]. In machine learning and statistics, `1 regularization of linear least-squares
problems (sometimes called lasso regression) plays a prominent role as an alternative
to Tikhonov regularization [Tib96, EHJT04].
To illustrate our main result, we consider the nondifferentiable `1 regularization
function as a technique for selecting sparse solutions of LPs that do not have unique
primal solutions (i.e., LPs that are dual degenerate). In §4 we give numerical results
of `1 regularization on a set of standard LPs from the Netlib test set and on a set of
randomly generated LPs with prescribed degeneracy. The numerical results highlight
the effectiveness of this approach for encouraging sparse solutions obtained via an
interior-point method. With the main result described in this paper, we can select a
regularization parameter δ so that the solution of the regularized LP gives a minimum
`1 solution over the original LP optimal solution set.
2. Preliminaries. The following assumptions hold implicitly throughout:
3
EXACT REGULARIZATION OF LPs
Assumption 2.1 (Feasibility and finiteness). The feasible set is nonempty—i.e.,
there exists a point x̄ such that Ax̄ = b and x̄ ≥ 0. Moreover, the optimal value of
(P) is bounded—i.e., p∗ > −∞, where p∗ is the optimal value of (P).
Assumption 2.2 (Convexity and compactness). The regularization function φ
is convex, and the level sets {x | φ(x) ≤ β} are compact for each β.
The first assumption guarantees that (P) and its dual have optimal solutions.
The first and second assumptions together ensure that (Pδ ) attains its optimal value
and that strong duality holds. (See, for example, Propositions 5.2.1 and 5.2.2 of
[Ber99] for these well-known results on linearly constrained convex programs and
LPs, respectively.)
Our main result hinges on relating the optimal solutions of (P) and (Pδ ) via the
(convex) optimization problem
(Pφ )
minimize
x
φ(x)
subject to Ax = b,
cTx ≤ p∗ ,
x ≥ 0.
By construction, any solution of (Pφ ) is also a solution of (P). (The converse, however,
does not generally hold.) We show that for all values of δ (including 0) below a certain
positive threshold value, every solution of (Pδ ) is also a solution of (Pφ ), and hence
also a solution of (P). The dual solutions of one problem are linear combinations of the
dual solutions of the other problem; the linear combination depends on δ and on the
Lagrange multiplier corresponding to the constraint cTx ≤ p∗ . Note that Lagrange
multipliers for (Pφ ) always exist because its constraints are linear. (Again, see [Ber99,
Proposition 5.2.1].)
We make the following definitions: The Lagrangian functions of (Pδ ) and (Pφ )
are respectively given by
Lδ (x, y, z) = cTx + δφ(x) − y T(Ax − b) − z Tx,
Lφ (x, y, z, µ) = φ(x) − y T(Ax − b) + µ(cTx − p∗ ) − z Tx,
where y ∈ Rm , z ∈ Rn , and µ ∈ R are independent variables. The associated Lagrange
dual functions are given by
gδ (y, z) = infn Lδ (x, y, z),
x∈R
gφ (y, z, µ) = infn Lφ (x, y, z, µ).
x∈R
Denote by S and Sδ the set of (primal) solutions of (P) and (Pδ ), respectively. Assumption 2.1 ensures that S and Sδ are nonempty. Denote by p∗δ the optimal value
of (Pδ ), and as defined above, p∗ as the optimal value of (P).
3. Main result. The following theorem is a precise statement of our main result.
Theorem 3.1. There exists a constant δ̄ > 0 such that Sδ ⊆ S for all δ ∈ [0, δ̄].
Moreover, each element in Sδ minimizes φ over S.
Proof. We first establish a few key properties of the dual optimal values of (P) and
(Pφ ). Assumption 2.1 implies that there exist optimal dual variables y ∗ and z ∗ ≥ 0
of (P) that satisfy (among other conditions)
ATy ∗ + z ∗ = c
and
bTy ∗ = p∗ .
(3.1)
4
M. P. FRIEDLANDER
Assumptions 2.1 and 2.2 imply that there exist optimal dual variables yφ∗ , zφ∗ ≥ 0, and
µ∗φ ≥ 0, of (Pφ ) such that
gφ (yφ∗ , zφ∗ , µ∗φ ) = bTyφ∗ − µ∗φ p∗ + infn φ(x) − (ATyφ∗ + zφ∗ − µ∗φ c)Tx
x∈R
(3.2)
is maximal. Hence, the infimum on the right-hand side of (3.2) attains its minimum
at some point x∗φ . (Recall that φ has compact level sets over Rn .) By strong duality,
x∗φ is optimal for (Pφ ), and
φ(x∗φ ) = gφ (yφ∗ , zφ∗ ).
(3.3)
We next consider the regularized problem (Pδ ). Its feasible set coincides with
(P), so the assumptions ensure that there exist optimal dual variables yδ∗ and zδ∗ ≥ 0.
We show that these optimal dual variables are in fact linear combinations of the
optimal dual variables of (P) and (Pφ ). The linear combination appears in two forms,
depending on whether µ∗φ is zero or positive. We consider these two cases in turn.
Case 1: µ∗φ = 0. We claim that the optimal dual variables of (Pδ ) are given by
yδ∗ := y ∗ + δyφ∗ ,
zδ∗ := z ∗ + δzφ∗ .
(3.4)
Clearly zδ∗ ≥ 0, as required. Evaluate gδ (y, z) with the pair (yδ∗ , zδ∗ ), and use (3.4) to
obtain
gδ (yδ∗ , zδ∗ ) ≡ infn Lδ (x, yδ∗ , zδ∗ )
=
x∈R
bTyδ∗
T ∗
+ infn δφ(x) − (ATyδ∗ + zδ∗ − c)Tx
x∈R
δbTyφ∗
+ infn δφ(x) − {ATy ∗ + z ∗ − c + δ(ATyφ∗ + zφ∗ )}Tx
x∈R
= p∗ + δ bTyφ∗ + infn φ(x) − (ATyφ∗ + zφ∗ )Tx ,
=b y +
(3.5)
x∈R
in which we use (3.1) to arrive at the last expression. Moreover, (3.2) and (3.3) (with
µ∗φ = 0) imply that the infimum on the right-hand-side of (3.5) achieves its minimum
at the point x∗φ . Substitute (3.2) and (3.3) into (3.5) to obtain
gδ (yδ∗ , zδ∗ ) = p∗ + δφ(x∗φ ).
(3.6)
Note that x∗φ is feasible for (Pφ ) and for (Pδ ), so that
p∗ ≥ cTx∗φ
and
cTx∗φ + δφ(x∗φ ) ≥ p∗δ .
(3.7)
It then follows from (3.6) and (3.7) that gδ (yδ∗ , zδ∗ ) ≥ p∗δ . By weak duality, however,
gδ (yδ∗ , zδ∗ ) ≤ p∗δ , so that in fact gδ (yδ∗ , zδ∗ ) = p∗δ .
This proves that strong duality holds, and that the dual optimum is attained,
at the pair (yδ∗ , zδ∗ ) defined by (3.4). Therefore, (3.5) and (3.6) together imply that
inf x∈Rn Lδ (x, yδ∗ , zδ∗ ) attains its minimum at x∗φ , so that in fact x∗φ is a solution of
(Pδ ). Moreover, x∗φ is feasible for (P), so (3.7) implies that it is also a solution of
(P). We therefore conclude that, when µ∗φ = 0, then x∗φ (and in fact, every solution
of (Pφ )) is an optimal solution of (Pδ ) and also of (P), for all values of δ ≥ 0. In this
case, δ̄ = +∞.
5
EXACT REGULARIZATION OF LPs
Case 2: µ∗φ > 0. We claim that the optimal dual variables of (Pδ ) are given by
yδ∗ := (1 − λ)y ∗ +
λ ∗
y ,
µ∗φ φ
zδ∗ := (1 − λ)z ∗ +
λ ∗
z ,
µ∗φ φ
(3.8)
for any λ ∈ [0, 1]. Clearly zδ∗ ≥ 0, as required. Evaluate gδ (y, z) with this pair, and
consider values of δ ≡ λ/µ∗φ , to get
gδ (yδ∗ , zδ∗ ) ≡ infn Lδ (x, yδ∗ , zδ∗ )
x∈R
= bTyδ∗ + infn δφ(x) − (ATyδ∗ + zδ∗ − c)Tx
x∈R
λ T ∗
b yφ
µ∗φ
T
λ
T ∗
∗
T ∗
∗
+ infn δφ(x) − (1 − λ)(A y + z ) − c + ∗ (A yφ + zφ ) x (3.9)
x∈R
µφ
T
λ T ∗
λ
λ
∗
T ∗
∗
= (1 − λ)p + ∗ b yφ + infn ∗ φ(x) −
(A yφ + zφ ) − λc x
x∈R µφ
µφ
µ∗φ
λ
= p∗ + ∗ bTyφ∗ − µ∗φ p∗ + infn φ(x) − (ATyφ∗ + zφ∗ − µ∗φ c)Tx ,
x∈R
µφ
= (1 − λ)bTy ∗ +
where we use (3.1) to arrive at the second-to-last inequality. As we noted for (3.5),
the last infimum of the right-hand side of (3.9) achieves its minimum at the point x∗φ
(but this time with µ∗φ > 0). Substitute (3.2) and (3.3) into (3.9) to obtain
gδ (yδ∗ , zδ∗ ) = p∗ +
λ
φ(x∗φ ).
µ∗φ
(3.10)
With the same arguments as used for Case 1, we can conclude that, when µ∗φ > 0,
then x∗φ (and in fact, every solution of (Pφ )) is an optimal solution of (Pδ ) and also
of (P), for all values of
δ ∈ [0, δ̄]
with δ̄ :=
1
> 0.
µ∗φ
(3.11)
The last statement of the theorem follows immediately from the fact that x∗φ
solves both (P) and (Pφ ).
Note that the range of values admitted by (3.11) is consistent with the extreme
case µ∗φ & 0. In that situation, all nonnegative values of the penalty parameter are
allowed by Theorem 3.1, as is predicted when µ∗φ = 0 (Case 1 of the theorem).
4. Sparse solutions. In this section we illustrate a practical application of Theorem 3.1. The aim is to obtain sparse solutions of linear programs that do not have
unique optimal solutions. For the following discussion we let φ(x) = kxk1 , which
clearly satisfies both standing assumptions.
Regularization based on the `1 norm has been used in many applications with
the goal of obtaining (or approximating) the sparsest solution of underdetermined
systems of linear equations and least-squares problems. Some recent examples include
[CDS01, DE03, DT05, DET05].
6
M. P. FRIEDLANDER
For underdetermined systems of equations that arise in fields such as signal processing, [CDS01], [CRT04], and [DT05] advocate solving the problem
minimize kxk1
x
subject to Ax = b
(and possibly x ≥ 0)
(4.1)
in order to obtain a sparse solution. (Well-known techniques exist for recasting (4.1)
as an LP.) The sparsest solution is in fact given by minimizing the so-called 0-norm
kxk0 , which counts the number of nonzero elements in x. The combinatorial nature
of such a problem, however, makes it computationally intractable for all but the most
trivial cases. Interestingly, there exist conditions under which a solution of (4.1) is
the sparsest feasible solution; see [CRT04] and [DT05].
Following this approach, we use Theorem 3.1 as a guide for obtaining least `1 norm solutions of a generic LP
minimize cTx subject to Ax = b,
x
l ≤ x ≤ u,
(4.2)
by solving its regularized version:
minimize cTx + δkxk1
x
subject to Ax = b,
l ≤ x ≤ u.
(4.3)
(The vectors l and u are lower and upper bounds on x.) In many of the numerical
tests given below, the exact `1 -regularized solution of (4.2) (given by (4.3) for smallenough values of δ) is considerably sparser than the solution obtained by solving (4.2)
directly. In each case, we solve the regularized and unregularized problems with the
same interior-point solver. We emphasize that, with the appropriate choice of the
regularization parameter, the solution of the regularized LP is also a solution of the
original LP.
We consider two sets of test problems in our numerical experiments. The problems
of the first set are constructed from random data using a degenerate LP generator
described by [Gon03]. Those of the second are derived from the infeasible LPs in the
Netlib collection (http://www.netlib.org/lp/infeas/). Both sets of test problems
are further described in §§4.1–4.2.
For each test problem we follow the same procedure. We first reformulate the
problem as an LP; this problem corresponds to (P). (There may, of course, be upper
and lower bounds on x.) We solve the unregularized LP in order to obtain x∗ (an
unregularized solution), and thus the optimal value p∗ := cTx∗ . Next, in order to
obtain the threshold value δ̄ of the regularization parameter, we solve the LP that
corresponds to (Pφ ). (In the nontrivial case, this value is given by (3.11).) Finally,
an exact regularized solution x∗δ is obtained by solving (Pδ ) with δ := δ̄/2.
We use the interior-point algorithm implemented in CPLEX 9.1 to solve each
subproblem. The default CPLEX options are used, except for crossover = 0 and
comptol = 1e-10. The first option forces CPLEX to use its barrier algorithm and to
not “cross over” to an optimal basic solution. Under certain conditions, we expect
the interior method to converge to the analytic center of the optimal face (see [Ye97,
Theorems 2.16 and 2.17]). The second option tightens CPLEX’s convergence tolerance
from its default of 1e-8 to its smallest allowable setting. We do not advocate such
a tight tolerance in practice, but this change aids in computing the sparsity of a
computed solution, which we determine as
√
(4.4)
kxk0 = card{xi | |xi | > }.
7
EXACT REGULARIZATION OF LPs
Table 4.1
Random LPs generated with various levels of increasing dimension of the optimal primal face.
LP
random-0
random-20
random-40
random-60
random-80
random-100
cTx∗
cTx∗δ
kx∗ k1
kx∗δ k1
kx∗ k0 kx∗δ k0
2.5e−13
5.6e−13
3.8e−12
3.9e−14
9.1e−12
1.8e−16
1.0e−13
6.6e−13
3.7e−12
9.2e−11
8.4e−13
3.2e−12
9.1e+01
2.9e+02
4.9e+02
6.7e+02
8.9e+02
1.0e+03
9.1e+01
2.0e+02
2.9e+02
3.6e+02
4.6e+02
5.4e+02
100
278
459
637
816
997
100
100
100
101
100
102
δ̄
1.5e−04
2.2e−02
2.9e−02
3.3e−02
2.1e−01
1.1e−01
√
Here, ≈ 2.2 · 10−16 is the relative machine precision. The value is slightly larger
than the specified convergence tolerance.
The AMPL model and data files used to generate all of the numerical results
presented in §§4.1–4.2 can be obtained at http://www.cs.ubc.ca/~mpf/exactreg/.
4.1. Random LPs. Six dual-degenerate LPs were constructed using Gonzaga’s
Matlab generator [Gon03]. This Matlab program accepts as inputs the problem
size and the dimensions of the optimal primal and dual faces, Dp and Dd , respectively.
Gonzaga shows that these quantities must satisfy
0 ≤ Dp ≤ n − m − 1
and
0 ≤ Dd ≤ m − 1.
(4.5)
The six LPs are constructed with parameters n = 1000, m = 100, Dd = 0, and various
levels of Dp set as 0%, 20%, 40%, 60%, 80%, and 100% of the maximum of 899 (given
by (4.5)). The problems are respectively labeled random-0, random-20, random-40,
and so on.
Table 4.1 summarizes the results. We confirm that in each case the optimal
values of the unregularized and regularized problems are nearly identical (at least to
within the specified tolerance). Except for the “control” problem random-0, the exact
regularized solution x∗δ has a strictly smaller `1 norm, and is considerably sparser,
than the unregularized solution x∗ .
4.2. Infeasible LPs. The second set of problems is derived from a subset of the
infeasible Netlib LPs. For each infeasible LP, we discard the original objective, and
instead form the problem
minimize kAx − bk1
x
subject to
l ≤ x ≤ u,
(Pi)
and its regularized counterpart
minimize kAx − bk1 + δkxk1
x
subject to
l ≤ x ≤ u.
(Piδ )
The unregularized problem (Pi) mimics the plausible situation where we wish to fit
a set of infeasible equations in the `1 -norm sense. But because the `1 norm is not
strictly convex, a solution of (Pi) may not be unique. The regularized problem (Piδ )
might be used in practice to further restrict that solution set.
The following infeasible Netlib LPs were eliminated because CPLEX returned
some error during the solution of (Pi) or (Piδ ): lpi-bgindy, lpi-cplex2, lpi-gran, lpi-klein1,
lpi-klein2, lpi-klein3, lpi-qual, lpi-refinery, lpi-vol1.
Table 4.2 summarizes the results. The exact regularized solution x∗δ has a smaller
or equal `1 norm than the unregularized solution x∗ in all cases. In 60% of the
8
M. P. FRIEDLANDER
Table 4.2
Least `1 feasible solutions of the infeasible Netlib LPs.
LP
lpi-bgdbg1
lpi-bgetam
lpi-bgprtr
lpi-box1
lpi-ceria3d
lpi-chemcom
lpi-cplex1
lpi-ex72a
lpi-ex73a
lpi-forest6
lpi-galenet
lpi-gosh
lpi-greenbea
lpi-itest2
lpi-itest6
lpi-mondou2
lpi-pang
lpi-pilot4i
lpi-reactor
lpi-woodinfe
cTx∗
cTx∗δ
kx∗ k1
kx∗δ k1
3.6e+02
5.4e+01
1.9e+01
1.0e+00
2.5e−01
9.8e+03
3.2e+06
1.0e+00
1.0e+00
8.0e+02
2.8e+01
4.0e−02
5.2e+02
4.5e+00
2.0e+05
1.7e+04
2.4e−01
3.3e+01
2.0e+00
1.5e+01
3.6e+02
5.4e+01
1.9e+01
1.0e+00
2.5e−01
9.8e+03
3.2e+06
1.0e+00
1.0e+00
8.0e+02
2.8e+01
4.0e−02
5.2e+02
4.5e+00
2.0e+05
1.7e+04
2.4e−01
3.3e+01
2.0e+00
1.5e+01
1.6e+04
6.0e+03
4.7e+03
5.2e+02
8.8e+02
1.5e+05
2.4e+09
4.8e+02
4.6e+02
4.0e+05
1.0e+02
1.5e+04
1.4e+06
2.3e+01
4.8e+05
3.2e+06
1.4e+06
6.9e+05
1.5e+06
3.6e+03
1.3e+04
5.3e+03
3.0e+03
2.6e+02
8.8e+02
3.8e+04
1.5e+09
3.0e+02
3.0e+02
4.0e+05
9.2e+01
7.1e+03
5.6e+05
2.3e+01
4.6e+05
2.7e+06
8.2e+04
5.1e+04
1.1e+06
2.0e+03
kx∗ k0 kx∗δ k0
518
633
25
261
1780
711
3811
215
211
54
10
9580
3658
7
12
297
536
773
569
60
437
441
20
261
1767
591
3489
215
211
54
11
1075
1609
7
14
244
336
627
357
87
δ̄
3.3e−03
3.4e−04
3.7e−01
9.9e−01
6.7e−04
3.1e−01
1.0e−02
1.6e−01
1.6e−01
1.2e−03
6.3e−01
3.9e−05
1.1e−04
6.5e−01
4.8e−01
9.5e−02
1.4e−06
3.6e−06
4.1e−05
5.0e−01
cases, the regularized solution is sparser than the unregularized solution. In 25% of
the cases, the solutions have the same amount of sparsity. In 15% of the cases (lpigalenet, lpi-itest6, and lpi-woodinfe), the regularized solutions are actually less sparse,
even though the `1 norm is lower.
5. Discussion. We emphasize that our main result is constructive, and offers a
procedure for determining the threshold value of the regularization parameter. However, we do not know how the threshold value might be computed directly from (P)
without also solving (Pφ ). It seems that admissible values of δ can in general only be
determined by first solving (P) to obtain p∗ , and subsequently solving (Pφ ) to obtain
µ∗φ , and thus δ̄ (c.f. (3.11)).
Suppose that a correct value of δ has been guessed, and an exact regularized solution x obtained. The dual variables (y, z) obtained from that regularized problem are
necessarily perturbed (c.f. expressions (3.4) and (3.8)). Therefore, it is not possible
to test the computed triple (x, y, z) against the optimality conditions of the original
LP in order to verify that x is indeed an exact solution.
In practice, if it were prohibitively expensive to solve (P) and (Pφ ) additionally,
we might adopt an approach suggested by [Luc87b] and [Man04] for Tikhonov regularization. Lucidi and Mangasarian suggest successively solving the regularized LP
with decreasing values δ1 > δ2 > · · · ; if successive regularized solutions do not change,
then it is likely that a correct regularization parameter has been obtained. We note
that in many cases, the threshold values δ̄ shown in Tables 4.1 and 4.2 are comfortably
large, and a value such as δ = 10−4 would cover 85% of the these cases.
Acknowledgments. I wish to thank my colleague Kevin Leyton-Brown for giving
me access to his CPLEX installation for the numerical experiments in §4. Also, my
EXACT REGULARIZATION OF LPs
9
sincere thanks to Adam Oberman for many fruitful discussions and comments on
earlier versions of this manuscript.
REFERENCES
[AG99]
[Ber82]
[Ber99]
[CDS01]
[CGT00]
[CRT04]
[CRT05]
[DE03]
[DET05]
[DT05]
[EHJT04]
[Fle85]
[Gon03]
[Han98]
[HM79]
[KQQ03]
[Luc87a]
[Luc87b]
[Man84]
[Man04]
[MM79]
[ST96]
[TA77]
[Tib96]
A. Altman and J. Gondzio, Regularized symmetric indefinite systems in interior point
methods for linear and quadratic optimization, Optimization Methods and Software
11 (1999), 275–302.
D. P. Bertsekas, Constrained optimization and Lagrange multiplier methods, Academic
Press, New York, 1982.
, Nonlinear programming, second ed., Athena Scientific, Belmont, MA, 1999.
S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit,
SIAM Rev. 43 (2001), no. 1, 129–159.
A. R. Conn, N. I. M. Gould, and Ph. L. Toint, Trust-region methods, MPS-SIAM Series on Optimization, Society of Industrial and Applied Mathematics, Philadelphia,
2000.
E. J. Candés, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal
reconstruction from highly incomplete frequency information, Tech. report, Applied
and Computational Analysis, California Institute of Technology, 2004.
, Stable signal recovery from incomplete and inaccurate measurements, To appear
in Comm. Pure Appl. Math., 2005.
D. L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal)
dictionaries via `1 minimization, Proc. Nat. Acad. Sci. USA 100 (2003), no. 5,
2197–2202.
D. L. Donoho, M. Elad, and V. Temlyakov, Stable recovery of sparse overcomplete
representations in the presence of noise, To appear in IEEE Trans. Info. Theory,
2005.
D. L. Donoho and J. Tanner, Sparse nonnegative solution of underdetermined linear
equations by linear programming, Proc. Nat. Acad. Sci. USA 102 (2005), no. 27,
9446–9451.
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression, The Annals
of Statistics 32 (2004), no. 2, 407–499.
R. Fletcher, An `1 penalty method for nonlinear constraints, Numerical Optimization
1984 (Philadelphia) (P. T. Boggs, R. H. Byrd, and R. B. Schnabel, eds.), Society
of Industrial and Applied Mathematics, 1985, pp. 26–40.
C. C. Gonzaga, Generation of degenerate linear programming problems, Tech. report,
Dept. of Mathematics, Federal University of Santa Catarina, Brazil, April 2003.
P. C. Hansen, Rank-deficient and discrete ill-posed problems, Society of Industrial and
Applied Mathematics, Philadelphia, 1998.
S.-P. Han and O. L. Mangasarian, Exact penalty functions in nonlinear programming,
Math. Prog. 17 (1979), 251–269.
C. Kanzow, H. Qi, and L. Qi, On the minimum norm solution of linear programs, J.
Optim. Theory Appl. 116 (2003), 333–345.
S. Lucidi, A finite algorithm for the least two-norm solution of a linear program, Optimization 18 (1987), no. 6, 809–823.
, A new result in the theory and computation of the least-norm solution of a
linear program, J. Optim. Theory Appl. 55 (1987), no. 1, 103–117.
O. L. Mangasarian, Normal solutions of linear programs, Math. Prog. Stud. 22 (1984),
206–216.
, A Newton method for linear programming, J. Optim. Theory Appl. 121 (2004),
1–18.
O. L. Mangasarian and R. R. Meyer, Nonlinear perturbation of linear programs, SIAM
J. Control Optim. 17 (1979), no. 6, 745–752.
M. A. Saunders and J. A. Tomlin, Solving regularized linear programs using barrier
methods and KKT systems, Tech. Report SOL Report 96-4, Stanford University,
Department of EES & OR, Stanford, CA 94305, December 1996, Also IRM Research
Report RJ 10064.
A. N. Tikhonov and V. Y. Arsenin, Solutions of ill-posed problems, V. H. Winston and
Sons, Washington, D.C., 1977, Translated from Russian.
R. Tibshirani, Regression shrinkage and selection via the Lasso, J. Royal. Statist. Soc
B. 58 (1996), no. 1, 267–288.
10
[Ye97]
[ZL02]
M. P. FRIEDLANDER
Y. Ye, Interior-point algorithms: Theory and analysis, Wiley, Chichester, UK, 1997.
Y.-B. Zhao and D. Li, Locating the least 2-norm solution of linear programs via a pathfollowing method, SIAM J. Optim. 12 (2002), no. 4, 893–912.
Download