Jeffrey Finkelstein Natali Ruchansky ro gr August 13, 2015 es s Finding a maximum linearly dependent set of vectors Contents . . . . . . . . . . . . 2 2 2 2 2 3 4 2 Related problems 2.1 In numerical linear algebra . . . . . . . . . . . . . . . . . . . . 2.1.1 Matrix factorization . . . . . . . . . . . . . . . . . . . . 2.1.2 Locally linear embedding . . . . . . . . . . . . . . . . . 2.1.3 Column subset selection . . . . . . . . . . . . . . . . . . 2.2 In combinatorial linear algebra . . . . . . . . . . . . . . . . . . 2.2.1 Minimum relevant variables . . . . . . . . . . . . . . . . 2.2.2 Maximum-likelihood decoding and sparse reconstruction 2.2.3 Minimum distance in linear codes and minimum circuit . . . . . . . . 6 6 6 6 6 6 6 6 6 W or k- in -p 1 Preliminaries 1.1 Vector spaces . . . . . . . . . . . . . . . . . . 1.2 Complexity of optimization problems . . . . . 1.2.1 Single-objective optimization problems 1.2.2 Multi-objective optimization problems 1.3 Parameterized complexity . . . . . . . . . . . 1.4 Largest low-rank subset problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Upper bounds 7 4 Lower bounds 9 5 Generating other algebraic structures 10 Copyright 2015 Jeffrey Finkelstein 〈jeffreyf@bu.edu〉 and Natali Ruchansky. This document is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License, which is available at https://creativecommons.org/licenses/by-sa/4.0/. The LATEX markup that generated this document can be downloaded from its website at https://gitlab.com/argentpepper/maxdepset. The markup is distributed under the same license. 1 1 1.1 Preliminaries Vector spaces In this paper, F denotes a field and V denotes a vector space over F. For any prime power q, the set Fq is the finite field of order q. We denote the zero vector by 0. A finite set of n vectors {v1 , . . . , vn P } is linearly dependent if there are n scalars a1 , . . . , aP n , not all zero, such that i=1 ai vi = 0. The set is linearly n independent if i=1 ai vi = 0 implies a1 = · · · = an = 0. The span of a set of vectors is the set of all linear combinations of vectors from the set. If V is a set of vectors and S is a subset of V , then the conditional span of S with respect to V , denoted spanV (S), is span(S) ∩ V . When V is clear from context, we omit the phrase “with respect to V ”. For example, if V is {e1 , e2 , e3 }, the set of standard basis vectors in R3 , and S is {e1 + e2 , e1 − e2 }, then spanV (S) = {e1 , e2 }. 1.2 Complexity of optimization problems Our notation and terminology for optimization problems come from [4]; we give definitions for minimization problems, but the definitions for maximization problems are similar. 1.2.1 Single-objective optimization problems An optimization problem is a four-tuple (I, S, m, t), where I is a set of instances, S is a solution relation, m : I × S → R+ is a measure function, and t is either min or max. The set of solutions for x is denoted S(x). The optimal measure for an instance x of a minimization problem with measure function m, denoted m∗ (x), is minw∈S(x) m(x, w). The budget problem for a minimization problem P , denoted Pb is the decision problem {(x, k) | ∃w ∈ S(x) : m(x, w) ≤ k}. The measure function is also known as the objective function; we use the terms interchangeably. A function f is an r-approximator for an optimization problem if m(x, f (x)) ≤ r(|x|) for each x ∈ I. A function produces optimal solutions if it is a 1approximator. 1.2.2 Multi-objective optimization problems We generalize the definition of single-objective optimization problems above to two-objective optimization problems. (Our notation for multi-objective optimization problems differs from the notation of, for example, [10].) A two-objective optimization problem is a four-tuple (I, S, m, t), where I and S are as above, but m now outputs a pair of positive real numbers and t = (t1 , t2 ), where each ti is either min or max. The projection of the measure function onto the first component of its output is denoted m1 , and onto the second, m2 . The first constraint problem for a two-objective optimization problem P , denoted P1 , is 2 the single-objective optimization problem (I 0 , S 0 , m0 , t0 ), where I 0 = {(x, k) | x ∈ I and k ∈ N} S 0 = {((x, k), w) | (x, w) ∈ S and m1 (x, w) ≤ k} m0 = m2 t0 = t2 . (The inequality in S 0 is reversed if the first objective is a maximization instead of a minimization.) The second constraint problem is defined similarly. The first constraint problem is an optimization problem in the second objective, and the second constraint problem is an optimization problem in the first objective. The budget problem, denoted Pb , is the decision problem in which each objective is constrained: Pb = {(x, k, `) | ∃w : (x, w) ∈ S and m1 (x, w) ≤ k and m2 (x, w) ≤ `}. (Again, the inequalities are reversed when the objectives are maximizations.) The linear scalarization of a two-objective optimization problem with measure function m is the single-objective maximization problem with measure function m0 (x, w) = c1 m1 (x, w)+c2 m2 (x, w) for some real constants c1 and c2 . We denote the linear scalarization of optimization problem P with constants c1 and c2 by L(P, c1 , c2 ). Both the constraint problems and the scalarization are techniques for converting a multi-objective optimization problem into a single-objective optimization problem. A solution w1 dominates a solution w2 if m(x, w1 ) ≤ m(x, w2 ), where the inequality is taken componentwise, and either m1 (x, w1 ) < m1 (x, w2 ) or m2 (x, w1 ) < m2 (x, w2 ). A solution (and its measure) are Pareto-optimal if no other solution dominates it. The Pareto front is the set of Pareto-optimal solutions. A multi-objective optimization problem is in the complexity class NPMO if I and S are decidable in deterministic polynomial time and m is computable in deterministic polynomial time. Lemma 1.1. For each two-objective optimization problem P , if P ∈ NPMO, then Pb ∈ NP. TODO Define approximation for two-objective optimization problems; there are several notions in [12, Section 5]. Relaxing the budget constraints in the constraint problems is called bi-criteria approximation algorithm in [14]. According to [12, Subsection 5.1], “A typical 2-objective problem does not have a single best approximation ratio [...], but there may exist a trade-off curve of incomparable best approximation ratios.” 1.3 Parameterized complexity We only consider parameterized complexity of optimization problems, in which the parameter is the budget parameter for the measure function. Let P be a 3 single-objective optimization problem with P = (I, S, m, t). The problem P is in the complexity class W[P] if there is a nondeterministic Turing machine M , computable functions f and h, and a polynomial p such that for each input (x, k) to the budget problem Pb , • if (x, k) ∈ Pb , then M (x, k) accepts, • if (x, k) ∈ / Pb , then M (x, k) rejects, • M (x, k) halts within f (k)p(n) steps, • M (x, k) uses at most h(k) log n nondeterministic bits, where n = |x|. For more details on parameterized complexity, see [11]. 1.4 Largest low-rank subset problem Our problem is a generalization of the problems of finding • • • • a a a a smallest linearly dependent subset of a given set of vectors, largest low-rank subset of a given set of vectors, maximum-likelihood decoding of any codeword in a given linear code, sparse reconstruction of any signal from a given overcomplete dictionary. It is the largest low-rank subset problem (abbreviated LLRS), the problem of finding a subset of vectors that optimizes two objectives, maximizing the cardinality and minimizing the rank. Candidate solutions for this problem provide a set of few vectors that explain most of the information in the full dataset. In information theoretic terms, the goal is to find a largest subset of smallest complexity. Definition 1.2 (LLRS(V, F)). Instance: finite set of vectors V in V. Solution: T ⊆ V . Measure: (|T |, rank(T )). Type: (max, min). This is a two-objective optimization problem, so there are multiple Paretooptimal solutions. Optimizing each of the objectives independently is trivial: to minimize rank(T ) just choose T to be the empty set, and to maximize |T | just choose T to be V . However, optimizing both simultaneously makes the problem more difficult. 4 We know that |T | ≥ rank(T ), so the solution space looks like this. n Pareto front |T | feasible solutions 0 0 rank(T ) n There are some salient degenerate instances of the problem, in which the Pareto front is trivially enumerable in polynomial time. If V has rank one, then each candidate solution (other than the trivial solution of cardinality zero) lies on the vertical line rank(T ) = 1. In this case, each candidate solution is Pareto-optimal. n |T | 1 1 rank(T ) n If V is linearly independent, the each candidate solution lies on the line |T | = rank(T ). Again, in this case each candidate solution is Pareto-optimal. n |T | 0 0 rank(T ) n We use the following notation when discussing this problem. d n k ` q dimension of the vector space V number of vectors in the input V budget parameter for the first objective, |T | budget parameter for the second objective, rank(T ) (prime power) order of a finite field 5 Usually, the vector space V will be the canonical d-dimensional vector space over Q or Fq , so we omit the V and F from the name of the problem. The constraint problem LLRS1 is exactly the minimum linear dependence problem, denoted Min LD: given a set of vectors V and a natural number k, find a subset T of V such that |T | ≥ k and rank(T ) is minimized. The second constraint LLRS2 is exactly the maximum feasible (homogeneous) linear subsystem problem, denoted Max FLS: given a set of vectors V and a natural number `, find a subset T of V such that rank(T ) ≤ ` and |T | is maximized. 2 2.1 Related problems In numerical linear algebra In these problems, vectors are interpreted as elements of a metric space, as opposed to elements of a vector space (in which there is no concept of distance, closeness or neighborhoods). 2.1.1 Matrix factorization 2.1.2 Locally linear embedding 2.1.3 Column subset selection 2.2 In combinatorial linear algebra These problems all concern finding a subset of vectors that minimize some property. 2.2.1 Minimum relevant variables See [4]. . . 2.2.2 Maximum-likelihood decoding and sparse reconstruction In coding theory, the maximum-likelihood decoding problem is. . . This problem is defined for finite fields. It is NP-complete [5], W[1]-complete [1, Lemma C.2], and not decidable by any deterministic Turing machine in subexponential time (assuming the exponential time hypothesis) [6]. In signal processing, the sparse reconstruction problem is. . . This problem is defined for rationals. It is NP-complete [17, Theorem 1]. 2.2.3 Minimum distance in linear codes and minimum circuit In coding theory, the minimum distance problem is . . . The problem is defined for finite fields. It is NP-complete [20] and in W[2] [7, Theorem 7]. TODO Prove that it is W[1]-hard by adapting the reduction from [19] to finite fields; this is tangential but would be very cool. 6 In signal processing, the minimum circuit problem is. . . This problem is defined for rationals. It is NP-complete [19, Theorem 1]. 3 Upper bounds Theorem 3.1. For each positive integer d and each prime power q, 1. LLRS over Qd is in NPMO, 2. LLRS over Fdq is in NPMO. Proof. The set of instances and the solution relation are decidable in polynomial time. The measure function is computable in polynomial time because computing |T | can be done in linear time and the rank of T can be computed in polynomial time via Gaussian elimination (over either field). Theorem 3.2. For each positive integer d and each prime power q, 1. LLRS1 over Qd is in W[P] (with respect to `), 2. LLRS1 over Fdq is in W[P] (with respect to `), 3. LLRS2 over Qd is in W[P] (with respect to k), 4. LLRS2 over Fdq is in W[P] (with respect to k), Proof. The algorithm proceeds as follows on input set V , where V = {v1 , . . . , vn }. • • • • Nondeterministically choose I ⊆ {1, . . . , n} of cardinality k. Compute T = {vi ∈ V | i ∈ I}. Compute rank(T ). Accept if and only if the rank of T is at most `. Let N be the number of bits required to represent V . If r denotes the maximum number of bits required to represent any entry in any vector in V , then N = O(dnr). Choosing k natural numbers less than or equal to n requires O(k log n) nondeterministic bits. Computing T from I requires O(N ) time. Computing the rank of T can be done by performing Gaussian elimination on the d × k matrix whose columns are the vectors of T and whose entries can be represented with r bits. In this case, the running time of Gaussian elimination (over Q or Fq ) is O(p(dkr)) for some polynomial p. Since we can assume without loss of generality that ` ≤ k ≤ n, the overall running time is therefore O(p(N )). If f is the constant function and h the identity function, then the problem is decidable by a nondeterministic Turing machine halting in time O(f (k)p(N )) using at most O(h(k) log N ) nondeterministic bits. If h is instead the constant function, the problem is decidable in time O(f (`)p(N )) using O(h(`) log N ) nondeterministic bits. This proves membership in W[P] for either field and either parameter. 7 Another way to show that the problem is in W hierarchy would be to show that the property “rank(T ) ≤ `” is definable in first-order logic (see [11] for details on descriptive complexity). Depending on the underlying field, this proposition is definable in FORQ or FORFq , first-order logic extended with a rank operator, as defined in [13, Chapter 4]. For finite fields, this class is a strict superset of the class of first-order logic sentences, evidence (not proof) that this problem cannot be defined this way. TODO Is LLRS in W[t] for some fixed natural number t? The rank function and the cardinality function are both polymatroidal, that is, submodular, non-decreasing, and having zero as the image of the empty set. Thus, LLRS is a subproblem of the more general two-objective optimization problem of simultaneously maximizing one polymatroidal function while minimizing another polymatroidal function. The two constraint problems are studied in [14]. The specific case of minimizing a submodular function subject to a cardinality lower bound constraint is studied in [18]. Theorem 3.3. For each positive integer d and each prime power q, 1. LLRS1 over Qd is in polyAPX. 2. LLRS1 over Fdq is in polyAPX. 3. LLRS2 over Qd is in polyAPX, 4. LLRS2 over Fdq is in APX. Proof. The first constraint problem is the problem of minimizing rank (a submodular function) subject to a lower bound constraint on cardinality (another submodular function). There are several approximation algorithms for minimizing a submodular function subject to such constraints. p • There is a probabilistic polynomial-time 5 n/ log n-approximator if k ≤ |V |/2 by running the algorithm from [18, Theorem 4.3] with the budget 2k, given inputs V and k. • There is a deterministic polynomial-time n-approximator [16, Theorem 5.4]. √ • There is a deterministic polynomial-time O( n log1.5 n)-approximator [14, Corollary 4.5]. The second constraint problem is the problem of maximizing cardinality subject to an upper bound constraint on rank. Again, there are several approximation algorithms for maximizing a submodular function subject to a submodular lower bound constraint. p • There is a probabilistic polynomial-time 2-approximator if ` ≥ 5 n/ log n by running the algorithm from [18, Theorem 4.3] with the budget √ ` , 5 given inputs V and `. 8 n/ log n 1 • There is a deterministic polynomial-time 1 + e−1 -approximator if ` ≥ √ Ω( n log n) by running the algorithm from [14, Corollary 4.10] with the budget O(√n`log n) , given inputs V and `. • There is a deterministic polynomial-time n-approximator [14, Theorem 4.7]. In the special case of finite fields, there is a q-approximator for the second constraint problem [2, Proposition 3.2.7]. 4 Lower bounds Theorem 4.1. For each positive integer d and each prime power q, 1. LLRSb over Qd is NP-complete, even for vectors of the form {+1, −1}d , 2. LLRSb over Fdq is NP-complete. Proof. Lemma 1.1 and Theorem 3.1 together prove that the problems are in NP. The budget problem for LLRS is exactly the budget problem for Max FLS (and the budget problem for Min LD, though we don’t use it here). Max FLSb over Qd is NP-complete even when the vectors have bipolar entries [3, Corollary 2]. Over Fdp for any prime p, the problem is NP-complete, implicitly proven in [2, Proposition 3.2.7]. Since a prime number is a prime power, the problem remains NP-complete for prime powers. Theorem 4.2. For each positive integer d and each prime power q, 1. LLRS1 over Qd is ??? (with respect to `), 2. LLRS1 over Fdq is ??? (with respect to `), 3. LLRS2 over Qd is ??? (with respect to k), 4. LLRS2 over Fdq is ??? (with respect to k). Proof. TODO Fill me in with W[1]-hardness The constraint problems for LLRS inherit the hardness of approximability of both Min LD and Max FLS. Theorem 4.3. For each positive integer d and each prime power q, 1. LLRS1 over Qd is ???, 2. LLRS1 over Fdq is ???, 3. LLRS2 over Qd is APX-hard, 4. LLRS2 over Fdq is APX-complete. 9 Proof. The second constraint problem over rationals is APX-hard [3, Theorem 5] and over finite fields APX-complete [2, Proposition 3.2.7]. It may be the case that the linear scalarization is hard to approximate as well. A nonnegative scalar multiple of a submodular function remains submodular, so the linear scalarization is the maximization of the difference of submodular functions. Maximizing the difference of submodular functions is equivalent to minimizing the difference of submodular functions, which is not approximable to within any poltnomial-time computable factor unless P = NP [15, Theorem 5.1]. However, the linear scalarization for this particular problem may admit an approximation algorithm. 5 Generating other algebraic structures We might also consider the corresponding largest low-rank subset problem for groups (abbreviated LLRSG). The rank of a subset T of group elements is the minimum number of elements required to generate a subgroup that contains T . (A generating set for a group acts like a spanning set for a linear space.) Definition 5.1 (LLRSG). Instance: finite group G, finite set of group elements V . Solution: T ⊆ V . Measure: (|T |, rank(T )). Type: (max, min). Maximizing |T | seems awfully close to the goal of the algorithm given in [8, Algorithm 4.6] that finds a basis B that maximizes |hZi ∩ B|, where Z is a finite subset of a free group F of finite rank. However, I don’t understand that paper. Theorem 5.2. If the input group is given as a Cayley table, there is a polynomialtime computable function that outputs the set of Pareto-optimal solutions for LLRSG. Proof. We use the fact that each finite group of order n has a generating set of size at most log n [9]. The input to the algorithm is a finite group G and a set of group elements V . 1. Enumerate each subset S of G of cardinality at most log n. 2. Let T = hSi ∩ V and compute |T | and rank(T ). 3. Output the sets T that are not dominated by any other solution set. There are a polynomial number of sets of cardinality at most log n. Computing hSi can be done in polynomial time (for example, by enumerating each group element and determining group membership), and computing the rank of T , can be done in polynomial time by the technique described in [9]. Determining whether a solution T is not dominated by another solution requires comparing 10 the two components of the measure function as computed in the previous step. There are a polynomial number of solution sets, so the number of pairwise comparisons remains a polynomial; these comparisons can again be performed in polynomial time. Overall the running time of the algorithm is polynomial in n, the order of the group G. TODO Can we generalize from vector spaces to matroids? References [1] Amir Abboud, Kevin Lewi, and Ryan Williams. “On the parameterized complexity of k-SUM”. In: arXiv (2013). url: http://arxiv.org/abs/ 1311.3054. Preprint. [2] Edoardo Amaldi. “From Finding Maximum Feasible Subsystems of Linear Systems to Feedforward Neural Networkx Design”. PhD thesis. École Polytechnique Fédérale de Lausanne, 1994. url: http://www.cs.cornell. edu/Info/People/amaldi/thesis.ps. [3] Edoardo Amaldi and Viggo Kann. “The complexity and approximability of finding maximum feasible subsystems of linear relations”. In: Theoretical Computer Science 147.1–2 (1995), pp. 181–210. issn: 0304-3975. doi: 10. 1016/0304-3975(94)00254-G. [4] Giorgio Ausiello et al. Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, 1999. isbn: 9783540654315. [5] Elwyn R. Berlekamp, Robert J. McEliece, and Henk C. A. van Tilborg. “On the Inherent Intractability of Certain Coding Problems”. In: IEEE Transactions on Information Theory 24 (3 May 1978), pp. 384–386. issn: 0018-9448. doi: 10.1109/TIT.1978.1055873. [6] Arnab Bhattacharyya et al. “The Complexity of Linear Dependence Problems in Vector Spaces”. In: Innovations in Computer Science. 2011, pp. 496– 508. url: http : / / conference . itcs . tsinghua . edu . cn / ICS2011 / content/paper/33.pdf. [7] David Cattanéo and Simon Perdrix. “The Parameterized Complexity of Domination-Type Problems and Application to Linear Codes”. In: Theory and Applications of Models of Computation. Ed. by T.V. Gopal et al. Vol. 8402. Lecture Notes in Computer Science. Springer International Publishing, 2014, pp. 86–103. isbn: 978-3-319-06088-0. doi: 10.1007/9783-319-06089-7_7. [8] Warren Dicks. “On free-group algorithms that sandwich a subgroup between free-product factors”. In: Journal of Group Theory 17 (1 Sept. 2013), pp. 13–28. issn: 1435-4446. doi: http://dx.doi.org/10.1515/jgt2013-0036. 11 [9] Jeffrey Finkelstein. “Computing rank of finite algebraic structures with limited nondeterminism”. 2015. url: http://cs-people.bu.edu/jeffreyf# grouprank. Manuscript. [10] Krzysztof Fleszar et al. “Structural Complexity of Multiobjective NP Search Problems”. In: LATIN 2012: Theoretical Informatics. Ed. by David Fernández-Baca. Vol. 7256. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2012, pp. 338–349. isbn: 978-3-642-29343-6. doi: 10. 1007/978-3-642-29344-3_29. [11] Jörg Flum and Martin Grohe. “Parameterized Complexity”. In: Texts in Theoretical Computer Science. Ed. by Wilfried Brauer, Grzegorz Rozenberg, and Arto Salomaa. Springer, 2006. isbn: 978-3-540-29952-3. [12] Christian Glaßer et al. Hardness and Approximability in Multi-Objective Optimization. Tech. rep. 2010. url: http://eccc.hpi-web.de/report/ 2010/031/. [13] Bjarki Holm. “Descriptive Complexity of Linear Algebra”. PhD thesis. University of Cambridge, 2010. [14] Rishabh K. Iyer and Jeff A. Bilmes. “Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints”. In: Advances in Neural Information Processing Systems 26. Ed. by C.J.C. Burges et al. Curran Associates, Inc., 2013, pp. 2436–2444. url: http://papers.nips. cc/paper/4911-submodular-optimization-with-submodular-coverand-submodular-knapsack-constraints.pdf. [15] Rishabh Iyer and Jeff Bilmes. “Algorithms for Approximate Minimization of the Difference Between Submodular Functions, with Applications”. In: Conference on Uncertainty in Artificial Intelligence. 2012, pp. 407–417. url: http://arxiv.org/abs/1408.2051. [16] Rishabh Iyer, Stefanie Jegelka, and Jeff Bilmes. “Fast Semidifferentialbased Submodular Function Optimization”. In: Proceedings of the 30th International Conference on Machine Learning. 2013, pp. 855–863. [17] B. K. Natarajan. “Sparse Approximate Solutions to Linear Systems”. In: SIAM Journal on Computing 24.2 (1995), pp. 227–234. doi: 10.1137/ S0097539792240406. [18] Zoya Svitkina and Lisa Fleischer. “Submodular Approximation: Samplingbased Algorithms and Lower Bounds”. In: SIAM Journal on Computing 40.6 (2011), pp. 1715–1737. doi: 10.1137/100783352. [19] A.M. Tillmann and M.E. Pfetsch. “The Computational Complexity of the Restricted Isometry Property, the Nullspace Property, and Related Concepts in Compressed Sensing”. In: IEEE Transactions on Information Theory 60.2 (Feb. 2014), pp. 1248–1259. issn: 0018-9448. doi: 10.1109/ TIT.2013.2290112. [20] Alexander Vardy. “The Intractability of Computing the Minimum Distance of a Code”. In: IEEE Transactions on Information Theory 43.6 (Nov. 1997), pp. 1757–1766. issn: 0018-9448. doi: 10.1109/18.641542. 12