The convex algebraic geometry of linear inverse problems Please share

advertisement
The convex algebraic geometry of linear inverse problems
The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.
Citation
Chandrasekaran, Venkat et al. “The Convex Algebraic Geometry
of Linear Inverse Problems.” 48th Annual Allerton Conference on
Communication, Control, and Computing (Allerton), 2010.
699–703. © Copyright 2010 IEEE
As Published
http://dx.doi.org/10.1109/ALLERTON.2010.5706975
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Version
Final published version
Accessed
Fri May 27 00:19:12 EDT 2016
Citable Link
http://hdl.handle.net/1721.1/72963
Terms of Use
Article is made available in accordance with the publisher's policy
and may be subject to US copyright law. Please refer to the
publisher's site for terms of use.
Detailed Terms
Forty-Eighth Annual Allerton Conference
Allerton House, UIUC, Illinois, USA
September 29 - October 1, 2010
The Convex Algebraic Geometry of Linear Inverse Problems
Venkat Chandrasekaran, Benjamin Recht, Pablo A. Parrilo, and Alan S. Willsky
Abstract— We study a class of ill-posed linear inverse problems in which the underlying model of interest has simple
algebraic structure. We consider the setting in which we have
access to a limited number of linear measurements of the
underlying model, and we propose a general framework based
on convex optimization in order to recover this model. This
formulation generalizes previous methods based on `1 -norm
minimization and nuclear norm minimization for recovering
sparse vectors and low-rank matrices from a small number of
linear measurements. For example some problems to which our
framework is applicable include (1) recovering an orthogonal
matrix from limited linear measurements, (2) recovering a
measure given random linear combinations of its moments,
and (3) recovering a low-rank tensor from limited linear
observations.
I. I NTRODUCTION
A fundamental principle in the study of ill-posed inverse
problems is the incorporation of prior information. Such
prior information typically specifies some sort of simplicity
present in the model. This simplicity could concern the
number of genes present in a signature for disease, correlation structures in time series data, or geometric constraints
on molecular configurations. Unfortunately, naive mathematical characterizations of model simplicity often result in
intractable optimization problems.
This paper formalizes a methodology for transforming
notions of simplicity into convex penalty functions. These
penalty functions are induced by the convex hulls of simple
models. We show that algorithms resulting from these formulations allow for information extraction with the number
of measurements scaling as a function of the number of
latent degrees of freedom of the simple model. We provide
a methodology for estimating these scaling laws and discuss
several examples. We focus on situations in which the basic
models are semialgebraic, in which case the regularized
inverse problems can be solved or approximated by semidefinite programming.
Specifically suppose that there is an object of interest
x ∈ R p that is formed as the linear combination of a small
number of elements of a basic semi-algebraic set A :
k
x = ∑ αi Ai ,
Ai ∈ A
i=1
Here we typically assume that k is relatively small. For
example the set A may be the algebraic variety of unitnorm 1-sparse vectors or of unit-norm rank-1 matrices, in
VC, PP, and AW are with the Laboratory for Information and Decision
Systems, Department of Electrical Engineering and Computer Science,
Massachusetts Institute of Technology, Cambridge, MA 02139. BR is with
the Computer Sciences Department, University of Wisconsin, Madison, WI,
53706.
978-1-4244-8216-0/10/$26.00 ©2010 IEEE
which case x is either a k-sparse vector or a rank-k matrix.
Further, we do not have access to x directly but to some set
of linear measurements of x given by a matrix M ∈ Rn×p :
b = Mx,
with n p.
Main Question Is it possible to recover x from the limited
information b?
Of course, this question is in general computationally
intractable to solve. Therefore we propose a natural convex
relaxation framework to solve this problem, akin to `1 norm minimization for recovering sparse vectors [6], [7],
[3] or nuclear-norm minimization for recovering low-rank
matrices [10], [16], [4]. This convex relaxation is obtained by
convexifying the algebraic set A . We also describe a general
framework for analyzing when this relaxation succeeds in
recovering the underlying object x. A few examples of
questions that can be answered by our broad formulation
include:
•
•
•
Given the eigenvalues of a matrix and additionally a
few linear measurements of the matrix, when can we
recover the matrix?
When does a system of linear equations have a unique
integer solution? This question is related to the integrality gap between a class of integer linear programs
and their linear-programming relaxations, and has previously been studied in [8], [14].
When can we recover a measure given a few linear
measurements of its moments?
In some cases the convex program obtained by convexifying the base algebraic set A may itself be intractable to
solve. For such problems we briefly outline how to obtain a
hierarchy of further semidefinite relaxations with increasing
complexity based on the Positivstellensatz [15].
Outline We describe the problem setup and the generic
construction of a convex relaxation in Section II. In
Section III we give conditions under which these convex relaxations succeed in recovering the underlying lowdimensional model. Section IV provides a brief outline of
the Positivstellensatz-based technique for further relaxing the
convex program described in Section II. Section V concludes
this paper.
This paper provides an outline of our framework and
a subset of the results, with the description of additional
detailed analysis and example applications deferred to a
longer report.
699
II. P ROBLEM S ETUP AND C ONVEX R ELAXATION
C. Examples
A. Setup
A basic semialgebraic set is the solution set of a finite
system of polynomial equations and polynomial inequalities
[1]. We will generally consider only algebraic sets that are
bounded, such as finite collections of points, or the sets of
unit-norm 1-sparse vectors or of unit-norm rank-1 matrices.
Given such a set A ⊆ R p let x ∈ R p be a vector formed as
a linear combination of elements of A :
k
x = ∑ αi Ai ,
Ai ∈ A , αi ≥ 0.
i=1
Here k is assumed to be relatively small. While x is not
known directly, we have access to linear measurements of x:
b = Mx,
where M ∈ Rn×p with n p.
The questions that we address include: When can we
recover x from the linear measurements b? Is it possible
to do so tractably using convex optimization? Under what
conditions does this convex program exactly recover the
underlying x? In particular how many random measurements
p are required for exact recovery using the convex program?
In Section II-C we give several examples in which answering such questions is of interest.
B. Convex relaxation
Before constructing a general convex relaxation, we define
the convex hull of an algebraic set. When a basic semialgebraic set A ⊂ R p is bounded, we denote its convex hull as
follows:
C(A ) = conv(A ).
Here conv denotes convex hull of a set. In many cases of
interest the set A is centrally symmetric (i.e., there exists a
“center” c ∈ R p such that y−c ∈ A implies that −y−c ∈ A ).
In such a situation, the set C(A ) is also centrally symmetric
and induces a norm (perhaps recentered and restricted to a
subspace):
kykA = inf{t | y ∈ tC(A ), t ≥ 0}.
Of course we implicitly assume here that the norm kykA is
only evaluated for those y that lie in the affine hull of C(A ).
Consequently we consider the following convex program
to recover x given the limited linear measurements b:
x̂ = arg min kx̃kA ,
x̃
s.t. M x̃ = b.
(1)
In the next section we present a number of examples that
fit into the above setting, and in Section III we provide a
simple unified analysis of conditions for exact recovery using
the convex program (1).
A number of applications fit into our framework for
recovering low-dimensional objects from a few linear measurements:
Recovering a vector from a list Suppose that there is
a vector x ∈ R p , and that we are given the entries of this
vector without the locations of these entries. For example if
x = [3 1 2 2 4]0 , then we are just given the list of numbers
[1 2 2 3 4] without their positions in x. Further suppose that
we have access to a few linear measurements of x. In such a
scenario, can we recover x by solving a convex program?
Such a problem is of interest in recovering rankings of
elements of a set. For this problem the algebraic set A is
the set of all permutations of x (which we know since we
have the list of numbers that compose x), and the convex
hull C(A ) is the permutahedron [19], [18].
Recovering a matrix given eigenvalues This problem is
in a sense the non-commutative analog of the one above.
Suppose that we have the eigenvalues λ of a symmetric
matrix, but no information about the eigenvectors. Can we
recover such a matrix given some additional linear measurements? In this case the set A is the set of all symmetric
matrices with eigenvalues λ , and the convex hull C(A ) is
given by the Schur-Horn orbitope [18].
Recovering permutation matrices Another problem of
interest in a ranking context is that of recovering permutation
matrices from partial information. Suppose that a small
number k of rankings of p candidates is preferred by a
population. By conducting surveys of this population one can
obtain partial linear information of these preferred rankings.
The set A here is the set of permutation matrices and the
convex hull C(A ) is the Birkhoff polytope, or the set of
doubly stochastic matrices [19]. The problem of recovering
rankings was recently studied by Jagabathula and Shah [12]
in the setting in which partial Fourier information is provided
about a sparse function over the symmetric group; however,
their algorithmic approach for solving this problem was not
based on convex optimization.
Recovering integer solutions In integer programming
one is often interested in recovering vectors in which the
entries take on values of ±1. Suppose that there is such a
sign-vector, and we wish to recover this vector given linear
measurements. This corresponds to a version of the multiknapsack problem [14]. In this case the algebraic set A is
the set of set of all sign-vectors, and the convex hull C(A )
is the hypercube. The image of this hypercube under a linear
map is also referred to as a zonotope [19].
Recovering orthogonal matrices In several applications
matrix variables are constrained to be orthogonal, which
is a non-convex constraint and may lead to computational
difficulties. We consider one such simple setting in which
we wish to recover an orthogonal matrix given limited
information in the form of linear measurements. In this
example the set A is the set p × p orthogonal matrices, and
the set C(A ) is the spectral norm ball. Therefore the convex
program (1) reduces to solving a spectral norm minimization
700
problem subject to linear constraints.
Recovering measures Recovering a measure given its
moments is another question of interest. Typically one is
given some set of moments up to order d, and the goal is
to recover an atomic measure that is consistent with these
moments. In the limited information setting, we wish to
recover a measure given a small set of linear combinations
of the moments. Such a question is also of interest in system
identification. The set A in this setting is the moment
curve, and its convex hull goes by several names including
the Caratheodory orbitope [18]. Discretized versions of this
problem correspond to the set A being a finite number of
points on the moment curve; the convex hull C(A ) is then
a cyclic polytope [19].
Low-rank tensor decomposition Low-rank tensor decompositions play an important role in numerous applications throughout signal processing and machine learning.
Developing computational tools to recover low-rank tensors
is therefore of interest. However, a number of technical issues
arise with low-rank tensors including the non-existence of
an SVD analogous to that for matrices, and the difference
between the rank of a tensor and its border rank [5]. A
further computational challenge is that the tensor nuclear
norm defined by convexifying the set of rank-1 tensors is in
general intractable to compute. We discuss further relaxations
to this tensor nuclear norm based on the Positivstellensatz in
Section IV.
III. E XACT R ECOVERY VIA C ONVEX R ELAXATION
A. General recovery result
In this section, we give a general set of sufficient conditions for exact recovery via convex relaxation. We then
apply these conditions to some specific examples (in the
next section) to give bounds on the number of measurements
required in each of these cases for exact recovery. Our results
hold for measurement matrices M ∈ Rn×p in which the
entries are independent standard Gaussian variables. This is
equivalent to assuming that the nullspace of the measurement
operator is chosen uniformly at random from the set of
(p − n)-dimensional subspaces. While such a measurement
ensemble may not be suitable in some applications, our
results suggest that a generic set of linear measurements
suffices for recovering x via convex relaxation.
Assume without loss of generality that the unknown object
x to be recovered lies on the boundary of the set C(A ).
It is then easily seen that exact recovery x̂ = x using the
convex program (1) holds if and only if the tangent cone1 at
x with respect to C(A ) has a transverse intersection with the
nullspace of the operator M. More concretely, let T (x) be
the tangent cone at x with respect to the convex body C(A )
and let N (M) be the nullspace of the operator M. Then we
have exact recovery if and only if:
N (M) ∩ T (x) = {0}.
1 Recall that the tangent cone at a point x with respect to a convex set C
is the set of directions from x to any point in C.
Thus we describe here a recovery result based purely on
analyzing the tangent cone at a point with respect to C(A ).
Following the analysis of Rudelson and Vershynin [17] of
`1 minimization for recovering sparse vectors, our arguments
are based on studying the ‘Gaussian width’ of a tangent cone.
The Gaussian width of a set D is given by
w(D) = E sup yT z,
y∈D
where z is a random vector with i.i.d. zero-mean, variance-1
Gaussian entries (the expectation is with respect to z). Let
S = B`p2 ∩ T (x) be the “spherical” part of the tangent cone
T (x). The width of the tangent cone is given by:
w(S) = E sup yT z.
y∈S
If S corresponds to the intersection of a k-dimensional
√
k
≤ w(S) ≤ k.
subspace with B`p2 , then we have that √k+1
The condition that the nullspace N (M) has a transverse intersection with T (x) is equivalent to requiring that
N (M) ∩ S = {0}. The following theorem due to Gordon
[11] gives the precise scaling required for such a trivial
intersection to occur with overwhelming probability (for
Gaussian random matrices):
Theorem 3.1 (Gordon’s Escape Through A Mesh): Let
M ∈ Rn×p
√ be a Gaussian random matrix. Assuming that
w(S) < n we have that N (M) ∩ S = {0} with probability
greater than
2
n
− w(S) /18 .
1 − 3.5 exp − √n+1
Remark The quantity w(S) is the key factor that determines
p n then number
√ of measurements required. Notice that
√
<
<
n, so that the probability of success is lower2
n+1
bounded by
n √
o
2
1 − 3.5 exp − n − w(S) /36 .
Hence the number of measurements required for exact recovery is n & w(S)2 .
B. Examples
While Gordon’s theorem provides a simple characterization of the number of measurements required, it is in general
not easy to compute the Gaussian width of a cone. Rudelson
and Vershynin [17] have worked out Gaussian widths for
the special case of tangent cones at sparse vectors on the
boundary of the `1 ball, and derived results for sparse vector
recovery using `1 minimization that improve upon previous
results.
In order to obtain more general results the key property
that we exploit is symmetry. In particular many of our
examples in Section II-C lead to symmetric convex bodies
C(A ) such as the orbitopes described in [18].
Proposition 3.2: Suppose that the set A is a finite collection of m points such the convex hull C(A ) is a vertextransitive polytope [19] that contains as its vertices all the
points in A . Using the convex program (1) we have that
701
O(log(m)) random Gaussian measurements suffice for exact
recovery of a point in A , i.e., a vertex of C(A ).
The proof will be given in a longer report. One can also
bound the Gaussian width of a cone by using Dudley’s
inequality [9], [13] in terms of the covering number at
all scales. Using these results and other properties of the
Gaussian width we obtain specific bounds for the following
examples:
1) For the problem of recovering sparse vectors from
linear measurements using `1 minimization, Rudelson
and Vershynin [17] have worked out the width for
tangent cones at boundary points with respect to the `1
ball. These results state that O(k log(p)) measurements
suffice to recover a k-sparse vector in R p via `1
minimization.
2) For the nuclear norm, we obtain via width based
arguments that O(kp) measurements suffice to recover
a rank-k matrix in R p×p using nuclear norm minimization. This agrees with previous results [16].
3) For a unique sign-vector solution in R p to a linear
system of equations, we require that the linear system
contain at least 2p random equations. This agrees with
results obtained in [8], [14].
4) In order to recover a p × p permutation matrix via a
convex formulation involving the Birkhoff polytope,
we appeal to Proposition 3.2 and the fact that there are
p! permutation matrices of size p × p to conclude that
we require n = O(p log(p)) measurements for exact
recovery.
2
5) Again using symmetry one can show that n ≈ 3p4
measurements are sufficient in order to recover a p × p
orthogonal matrix via spectral norm minimization.
We present width computations for other examples in more
detail in a longer report.
IV. C OMPUTATIONAL I SSUES
In this section we consider the computational problem that
arises if the convex set C(A ) is not tractable to describe. In
particular we focus on the case when the convex set C(A ) is
centrally symmetric and thus defines a norm (appropriately
recentered and perhaps with restriction to a subspace). Therefore the question of approximating the convex set C(A )
becomes one of approximating the atomic norm k · kA . We
note that the dual norm k · k∗A is given by:
Notice that z0 M † q ≤ 1, ∀z ∈ A if and only if the following
set is empty:
{z | z0 M † q > 1, z ∈ A }.
In order to certify the emptiness of this set, one can use a
sequence of increasingly larger semidefinite programs based
on the Positivstellensatz [1], [15]. This hierarchy is obtained
based on SDP representations of sum-of-squares polynomials
[15]. We provide more details in a longer report.
These Positivstellensatz-based relaxations provide inner
approximations to the set {q | kM † qk∗A ≤ 1}, and therefore a
lower bound to the optimal value of the dual convex program
(2). Consequently, they can be viewed as outer approximations to the atomic norm ball C(A ). This interpretation
leads to the following interesting conclusion: If a point x̃ is
an extreme point of the convex set C(A ) then the tangent
cone at x̃ with respect to C(A ) is smaller than the tangent
cone at (an appropriately scaled) x̃ with respect to the outer
approximation. Thus we have a stronger requirement in terms
of a larger number of measurements for exact recovery,
which provides a trade-off between the complexity of the
convex relaxation procedure and the amount of information
required for exact recovery.
V. C ONCLUSION
We developed a framework for solving ill-posed linear
inverse problems in which the underlying object to be
recovered has algebraic structure. Our algorithmic approach
for solving this problem is based on computing convex
hulls of algebraic sets and using these to provide tractable
convex relaxations. We also provided a simple and general
recovery condition, and a set of semidefinite programmingbased methods to approximate the convex hulls of these
algebraic sets in case they are intractable to characterize
exactly. We give more details and additional recovery results
(with more example applications) as well as an algebraicgeometric perspective in a longer report.
kyk∗A = sup{z0 y | z ∈ A }.
This characterization is obtained from Bonsall’s atomic decomposition theorem [2].
The hierarchy of relaxations that we describe (based on
the Positivstellensatz [1]) is more naturally specified with
respect to the dual of the convex program (1):
max b0 q
q
s.t. kM † qk∗A ≤ 1,
(2)
where M † refers to the adjoint (transpose) of the operator M.
Thus the dual convex program (2) can be rewritten as
max b0 q
q
s.t. z0 M † q ≤ 1, ∀z ∈ A .
702
R EFERENCES
[1] B OCHNAK , J., C OSTE , M., AND ROY, M. (1988). Real Algebraic
Geometry. Springer.
[2] B ONSALL , F. F. (1991). A General Atomic Decomposition Theorem
and Banach’s Closed Range Theorem. The Quarterly Journal of
Mathematics, 42(1):9-14.
[3] C AND ÈS , E. J., ROMBERG , J., AND TAO , T. (2006). Robust uncertainty principles: exact signal reconstruction from highly incomplete
frequency information. IEEE Trans. Info. Theory. 52 489–509.
[4] C AND ÈS , E. J. AND R ECHT, B. (2009). Exact matrix completion via
convex optimization. Found. of Comput. Math.. 9 717–772.
[5] DE S ILVA , V. AND L IM , L. (2008). Tensor rank and the ill-posedness
of the best low-rank approximation problem. SIAM Journal on Matrix
Analysis and Applications: Special Issue on Tensor Decompositions
and Applications, 30(3), pp.10841127.
[6] D ONOHO , D. L. (2006). For most large underdetermined systems of
linear equations the minimal `1 -norm solution is also the sparsest
solution. Comm. on Pure and Applied Math.. 59 797–829.
[7] D ONOHO , D. L. (2009). Compressed sensing. IEEE Trans. Info.
Theory. 52 1289–1306.
[8] D ONOHO , D. L. AND TANNER , J. (2010). Counting the faces of
randomly-projected hypercubes and orthants with applications. Discrete and Computational Geometry. 43 522–541.
[9] D UDLEY, R. M. (1967). The sizes of compact subsets of Hilbert
space and continuity of Gaussian processes. J. Functional Analysis
1: 290–330.
[10] M. FAZEL (2002). Matrix rank minimization with applications. PhD
thesis, Dep. Elec. Eng., Stanford University.
[11] G ORDON , Y. (1988). On Milman’s inequality and random subspaces
which escape through a mesh in Rn . Geometric aspects of functional
analysis, Isr. Semin. 1986-87, Lect. Notes Math. 1317, 84-106.
[12] JAGABATHULA , S. AND S HAH , D. (2010). Inferring Rankings Using
Constrained Sensing. Preprint.
[13] L EDOUX , M. AND TALAGRAND , M. (1991). Probability in Banach
Spaces. Springer.
[14] M ANGASARIAN , O. AND R ECHT, B. (2009). Probability of Unique
Integer Solution to a System of Linear Equations. Preprint.
[15] PARRILO , P. A. (2003). Semidefinite Programming Relaxations for
Semialgebraic Problems. Mathematical Programming Ser. B, Vol. 96,
No.2, pp. 293-320.
[16] R ECHT, B., FAZEL , M., AND PARRILO , P. A. (2009). Guaranteed
minimum rank solutions to linear matrix equations via nuclear norm
minimization. SIAM Review, to appear.
[17] RUDELSON , M. AND V ERSHYNIN , R. (2006). Sparse reconstruction
by convex relaxation: Fourier and Gaussian measurements. CISS
2006 (40th Annual Conference on Information Sciences and Systems).
[18] S ANYAL , R., S OTTILE , F., AND S TURMFELS , B. (2009) Orbitopes.
Preprint.
[19] Z IEGLER , G. (1995). Lectures on Polytopes. Springer.
703
Download