Property Testing via Set-Theoretic Operations The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Chen, Victor, Madhu Sudan and Ning Xie. "Property Testing via Set-Theoretic Operations ." Proceedings of the Second Symposium on Innovations in Computer Science - ICS 2011, Tsinghua University, Beijing, China, January 7-9, 2011. p.211-222. As Published http://conference.itcs.tsinghua.edu.cn/ICS2011/content/paper/32. pdf Publisher ITCS, Tsinghua University Press Version Author's final manuscript Accessed Thu May 26 20:32:14 EDT 2016 Citable Link http://hdl.handle.net/1721.1/67827 Terms of Use Creative Commons Attribution-Noncommercial-Share Alike 3.0 Detailed Terms http://creativecommons.org/licenses/by-nc-sa/3.0/ Property Testing via Set-Theoretic Operations Victor Chen∗ Madhu Sudan† Ning Xie‡ Abstract Given two testable properties P1 and P2 , under what conditions are the union, intersection or setdifference of these two properties also testable? We initiate a systematic study of these basic set-theoretic operations in the context of property testing. As an application, we give a conceptually different proof that linearity is testable, albeit with much worse query complexity. Furthermore, for the problem of testing disjunction of linear functions, which was previously known to be one-sided testable with a super-polynomial query complexity, we give an improved analysis and show it has query complexity O(1/2 ), where is the distance parameter. ∗ Princeton University. vychen@princeton.edu. Microsoft Research, One Memorial Drive, Cambridge, MA 02142. madhu@csail.mit.edu. Research supported in part by NSF Award CCR-0514915. ‡ MIT CSAIL. ningxie@csail.mit.edu. Research supported in part by NSF Award CCR-0514771, 0728645 and 0732334. † 0 1 Introduction During the last two decades, the size of data sets has been increasing at an exponential rate, rendering a linear scan of the whole input an unaffordable luxury. Thus, We need sublinear time algorithms that read a vanishingly small fraction of their input and still output something intelligent and non-trivial about the properties of the input. The model of property testing [33, 22] has been very useful in understanding the power of sublinear time. Property testing is concerned with the existence of a sublinear time algorithm that queries an input object a small number of times and decides correctly with high probability whether the object has a given property or whether it is “far away” from having the property. We model input objects as strings of arbitrary length, which can also be viewed as a function on arbitrarily large domain. Formally, let R be a finite set and D = {Dn }n>0 be a parametrized family of domains. RD denote the set of all functions mapping from D to R. A property P is simply specified by a family of functions P ⊆ RD . A tester for property P is a randomized algorithm which, given the oracle access to an input function f ∈ RD together with a distance parameter , distinguishes with high probability (say, 2/3) between the case that f satisfies P and the case that f is -far from satisfying P. Here, distance between functions f, g : D → R, denoted dist(f, g), is simply the probability that Prx∈D [f (x) 6= g(x)], where x is chosen uniformly at random from D, and dist(f, P) = ming∈P {dist(f, g)}. We say f is -far from P if dist(f, P) ≥ and -close otherwise. The central parameter associated with a tester is the number of oracle queries it makes to the function f being tested. Property testing was first studied by Blum, Luby and Rubinfeld [18] and was formally defined by Rubinfeld and Sudan [33]. The systematic exploration of property testing for combinatorial properties was initiated by Goldreich, Goldwasser, and Ron [22]. Subsequently, a rich collection of properties have been shown to be testable [8, 7, 3, 19, 31, 5, 4, 26, 25]. Perhaps the most fundamental question in property testing is the following: which properties have local testing algorithms whose running time depends only on the distance parameter ? Are there any attributes that make a property locally testable? Questions of this type in the context of graph property testing were first raised in [22] and later received a lot of attention. Some very general results have been obtained [2, 8, 7, 21, 3, 19], leading to an (almost) complete qualitative understanding of which graph properties are efficiently testable in the dense graph model (see [14] for some recent progress in the sparse graph model). In addition, for an important class of properties, namely H-freeness for fixed subgraphs H, it is known exactly for which H, testing H-freeness requires the query complexity to be super-polynomial in 1/ and for which only a polynomial number of queries suffice: This was shown by Alon [1] for one-sided error testers and by Alon and Shapira [6] for general two-sided error testers. Progress toward similar understanding has also been made for hypergraph properties [32, 9, 7]. However, much less is known for algebraic properties. In a systematic study, Kaufman and Sudan [27] examined the query complexity of a broad class of algebraic properties based on the invariance of these properties under linear transformations. Roughly speaking, they showed that any locally-characterized linearinvariant and linear1 properties are testable with query complexities polynomial in 1/. Non-linear linearinvariant properties were first shown to be testable by Green [24] and were formally studied in [15]. The properties studied in [24, 15] are “pattern-freeness” of Boolean functions, which has been attracting considerable attention [24, 15, 34, 29, 16], as such a study may lead to a complete characterization of testability for functions, analogous to the setting of graphs. 1 A property F is linear if for any f and g that are in F necessarily implies that f + g is in F. 1 1.1 Motivation for set-theoretic operations In this paper we propose a new paradigm to systematically study algebraic property testing. First, decompose a natural algebraic property into the union or intersection (or some other set operation) of a set of “atomic properties”. Second, try to show that each of these atomic properties is testable. Finally, prove that some “composite” property obtained from applying some set theoretic operations on the (testable) atomic properties is also testable. A prominent example is the set of low-degree polynomials [4, 26, 25]. It is easy to d+1 see that the property of being a degree-d polynomial over GF(2) is simply the intersection of 22 −2 atomic properties. Indeed, let Pd denote the set of n-variate polynomials of degree at most d. Then, by the characterization of low-degree polynomials (see, e.g., [4]), f ∈ Pd if and only if for every x1 , . . . , xd+1 ∈ Fn2 , X X f( xi ) ≡ 0 (mod 2). ∅6=S⊆[d+1] i∈S Now fix an ordering of the non-trivial subsets of [d + 1] = {1, 2, . . . , d + 1}. Let ~b be a bit-string d+1 of length 22 −1 with an odd number of ones and Pd,~b denote the set of functions f such that the string P d+1 hf ( i∈S xi )i∅6=S⊆[d+1] is not equal to b. By definition Pd is the intersection of 22 −2 “b-free” properties Pd,~b ’s.2 In order to carry out this program of decomposing an algebraic properties into atomic ones, one must have a solid understanding of how basic set-theoretic operations affect testability. For instance, given two testable properties, is the union, intersection, or set-difference also testable? Previously, Goldreich, Goldwasser and Ron considered such questions in their seminal paper [22]. They observed that the union of two testable properties is always testable (cf. Section 3.1) but also provided examples showing that in general, testability is not closed under other set-theoretic operations. Thus, current understanding of testability via set-theoretic operations seems insufficient to carry out the above mentioned program of attack. 1.2 Our results In this paper, we show more positive results for these basic set-theoretic operations and illustrate several applications. We now describe our contribution in more detail. Set-theoretic operations We provide sufficient conditions that allow local testability to be closed under intersection and set difference. Given two locally testable properties, we show that if the two properties (minus their intersection) are sufficiently far apart, then their intersection is also locally testable. For set difference, a similar statement can also be made, albeit with more technicality, requiring that one of the properties must be “tolerantly testable”. A more detailed treatment of these set operations appears in Section 3. We remark that in the general case, testability is not closed under most set operations. Thus, putting restrictions on these properties is not unwarranted. Applications of these set-theoretic considerations appear in Sections 4.2 and 4.3. Furthermore, Section 4.3 demonstrates the simplicity that comes from these set-theoretic arguments. There, via set theory, we define a new property from an established one, and show that the new property’s testability, in terms of both upper and lower bound, is inherited from the property where it was derived. In fact, some of these 22 permutation of the xi ’s. 2 d+1 −2 properties are identical since the set of non-trivial subsets generated by xi is invariant under 2 Disjunction of linear functions In addition to set theory, it is also natural to ask whether testability is closed under the closure of some fundamental, unary operations. For instance, given a testable property P, under what condition is its additive closure ⊕P testable? A similar question can also be asked for the disjunctive operator ∧, which is one of the most basic operations used to combine formulas. Given a testable property P, is its disjunctive closure ∧P testable? Trivially, if P is linear, then ⊕P = P and testability is preserved. Furthermore, if P1 and P2 are both linear and linear-invariant as introduced by Kaufman and Sudan [27], then their sumset P1 + P2 is testable. However, in general, not much can be said about how these basic operations affect testability. Here we focus on disjunction’s effect on one specific property, namely the set of linear functions. Before we describe our result, we note some previous works in testing where disjunction played a role. For the disjunction of monomials, Parnas et. al. [31] gave a testing algorithm for s-term monotone DNF with query complexity Õ(s2 /). Diakonikolas et. al. [20] generalized Parnas et. al.’s result to general s-term DNF with query complexity Õ(s4 /2 ). We take a different direction and ask how disjunction affects the testability of the set of linear functions. The property of being a linear Boolean function (see next section for a full discussion), first studied by Blum, Luby and Rubinfeld [18], is testable with query complexity O(1/). As observed in [15], the class of disjunction of linear functions is equal to the class of 100-free functions (see Preliminaries for a definition). There they showed that a sufficiently rich class of “pattern-free” functions is testable, albeit with query complexity a tower of 2’s whose height is a function of 1/. In a different context, the authors in [23] showed implicitly3 that the disjunction of linear functions is testable with query complexity polynomial in 1/, but with two-sided error. Since both [15] and [23] seek to describe rich classes of testable Boolean functions, the bounds in both do not adequately address how disjunction affects the query complexity of the underlying property, the set of linear functions. In Section 4.1, we give a direct proof, showing that the disjunction of linear functions is testable with query complexity O(1/2 ) and has one-sided error. Thus, the blowup from the disjunctive operator is O(1/). It will be interesting to see if the blowup is optimal for this problem. A different proof for linearity testing Linearity testing, first proposed by Blum, Luby and Rubinfeld [18], is arguably the most fundamental and extensively studied problem in property testing of Boolean functions. Due to its simplicity and important applications in PCP constructions, much effort has been devoted to the study of the testability of linearity [18, 12, 11, 10, 28]. For linearity, we indeed are able to carry out the program of decomposing an algebraic property into atomic pattern-free properties, and thus obtain a novel new proof that linearity is testable in Section 4.2. In particular, linearity is easily seen to be equal to the intersection of two atomic properties, namely trianglefreeness (see Section 2) and disjunction of linear functions, which are both testable. The query complexity of linearity in our proof is of the tower-type, drastically worse than the optimal O(1/) bound, where is the distance parameter. We note that our effort in obtaining a new proof lies not in improving the parameters, but in understanding the relationships among these atomic, testable properties. In fact, we believe that despite the poor upper bound, our new proof is conceptually simple and gives evidence that set theory may uncover new testable properties. 3 We thank an anonymous reviewer of ICS 2010 for pointing this out. 3 1.3 Techniques Our new proof that linearity is testable is built on the testability results for triangle freeness (see definition in Section 2) and the disjunction of linear functions. The latter was already shown to be testable in [15]. However, in this work, we give a completely different proof using a BLR-styled approach. Our proof is a novel variant of the classical self-correction method. Consequently, the query upper bound we obtain (quadratic in ) is significantly better than the tower-type upper bound shown in [15]. In fact, to the best of our knowledge, this is the first and only polynomial query upper bound for testing pattern-freeness properties. All other analysis for testing pattern-freeness properties apply some type of “regularity lemma”, thus making tower-type query upper bounds unavoidable. We believe that both the self-correction technique and the investigation of set-operations may be useful in the study of testing pattern-freeness. From the works developed in [34, 29], we know that for every d, the property Pd,~1 is testable.4 However, for an arbitrary b, the testability of Pd,~b remains open. And in general very little can be said about the testability of an arbitrary intersection of these properties. Since Pd is known to be testable using self-correction [4], we believe that self-correction, applied in conjunction with set-theory, may be useful for understanding these pattern-free properties. 2 Preliminaries We now describe some basic notation and definitions that we use throughout the paper. We let N = {0, 1, . . .} denotes the set of natural numbers and [n] the set {1, . . . , n}. We view elements in Fn2 as nbit binary strings, that is elements of {0, 1}n . For x ∈ Fn2 , we write xi ∈ {0, 1} for the ith bit of x. x and PIf n y are two n-bit strings, then x + y denotes bitwise addition (i.e., XOR) of x and y, and x · y = i=1 xi yi (mod 2) denotes the inner product between x and y. We write (x, y) to denote the concatenation of two bit strings x and y. For convenience, sometimes we view a n-bit binary string as a subset of [n], that is, for every x ∈ Fn2 there is a corresponding subset Sx ⊆ [n] such that xi = 1 iff i ∈ Sx for every 1 ≤ i ≤ n. We write |x| to indicate the Hamming weight of x, i.e., the number of coordinates i such that xi = 1. Equivalently, this is also the cardinality of subset Sx . By abuse of notation, we use parentheses to denote multisets; for instance, we write (a, a, b, b, b) for the multiset which consists of two a’s and three b’s. Let f : Fn2 → {0, 1} be a Boolean function. The support of f is supp(f ) = {x ∈ Fn2 : f (x) = 1}. Recall that for two functions f and g defined over the same domain, the (fractional) distance between def these two functions is dist(f, g) = Prx∈D [f (x) 6= g(x)]. Let P1 and P2 be two properties defined over the same domain D, then the distance between these two properties, dist(P1 , P2 ), is simply defined to be minf ∈P1 ,g∈P2 {dist(f, g)}. A Boolean function f : Fn2 → {0, 1} is linear if for all x and y in Fn2 , f (x) + f (y) = f (x + y). We denote the set of linear function by PLIN . Throughout this paper, we will be working with the pattern generated by the triple (x, y, x + y). To this end, we say that a Boolean function f : Fn2 → {0, 1} is (1, 0, 0)-free if for all x and y in Fn2 , (f (x), f (y), f (x + y)) 6= (1, 0, 0), where here and after we view (f (x), f (y), f (x + y)) as well as (1, 0, 0) as multisets5 . Similarly, a (1, 1, 0)-free Boolean function is defined analogously. Lastly, we say that a Boolean function f : Fn2 → {0, 1} is triangle-free if for all x and y in Fn2 , (f (x), f (y), f (x + y)) 6= (1, 1, 1). We denote the set of triangle-free functions by P(111)-FREE . Note that P(111)-FREE is monotone: if f ∈ P(111)-FREE and we modify f by setting some of the points 4 Actually, stronger theorems were proved in [34, 29], but to state their works in full, definitions not needed in this work will have to be introduced. 5 That is, for example, we do not distinguish the case hf (x), f (y), f (x+y)i = h1, 0, 0i from hf (x), f (y), f (x+y)i = h0, 1, 0i. 4 in Fn2 from 1 to 0, then the new function is clearly also triangle-free. We can encapsulate a more general statement as follows: Observation 1. Let f and g be two Boolean functions such that supp(f ) ⊆ supp(g). Then dist(f, P(111)-FREE ) ≤ dist(g, P(111)-FREE ). For concreteness, we provide a formal definition of a tester. Definition 1 (Testability). Let R be a finite set and D = {Dn }n>0 be a parametrized family of domains. Let P ⊆ RD be a property. We say a (randomized) algorithm T is a tester for P with query complexity q(, n) if for any distance parameter > 0, input size n and function f : Dn → R, T satisfies the following: • T queries f at most q(, n) times; • (completeness) if f ∈ P, then Pr[T accepts] = 1; • (soundness) if dist(f, P) ≥ , then Pr[T accepts] ≤ internal randomness used by T . 1 3, where the probabilities are taken over the We say that a property is locally testable if it has a tester whose query complexity is a function depending only on , independent of n. In this work, we use the word testability to describe the stronger notion of local testability. For our main results, we will work with the model case when Dn = Fn2 and R = {0, 1}. 3 Basic theory of set operations In this section, we present some basic testability results based on set-theoretic operations such as union, intersection, complementation, and set-difference. The proofs here are fairly standard and are thus deferred to the Appendix. 3.1 Union It is well known that the union of two testable properties remains testable. This folklore result first appeared in [22]; for completeness, a proof is included in Appendix A. Proposition 1 (Folklore). Let P1 , P2 ⊆ RD be two properties defined over the same domain D = {Dn }n>0 . For i = 1, 2, suppose Pi is testable with query complexity qi (). Then the union P1 ∪ P2 is testable with query complexity O(q1 () + q2 ()). 3.2 Intersection The case of set intersection is more complicated than union. Goldreich et al. showed in[22] (see Proposition 4.2.2) that there exist testable properties whose intersection is not testable. Thus, in general, testability does not follow from the intersection operation. However, testability may still follow in restricted cases. In particular, we show that if two testable properties P1 and P2 minus their intersection are sufficiently far from each other, then their intersection remains testable as well. A proof is included in Appendix B. Proposition 2. Let P1 , P2 ⊆ RD be two properties defined over the same domain D = {Dn }n>0 . Suppose dist(P1 \ P2 , P2 \ P1 ) ≥ 0 for some absolute constant 0 , and for i = 1, 2, Pi is testable with query complexity qi (). Then the intersection P1 ∩ P2 is testable with query complexity O(q1 () + q2 ()), 5 3.3 Complementation Here we examine the effect complementation has on the testability of a property. As it turns out, all three outcomes - both P and P̄ are testable, only one of P and P̄ is testable, and neither P nor P̄ is testable - are possible! The first outcome is the easiest to observe. Note that the property DR and the empty property are complements of each other, and both are trivially testable. The second outcome is observed in Proposition 4.2.3 in [22]. To our knowledge, the third outcome has not been considered before. In fact, previous constructions of non-testable properties, e.g. [22, 13], are sparse. Hence, the complements of these nontestable properties are trivially testable (by the tester that accepts all input functions). One may wonder if in general the complement of a non-testable property must also be testable. We disprove this in the following proposition. Proposition 3. There exists some property P ⊆ RD where R = {0, 1} and D = {Fn2 }n>0 , such that neither P nor P is testable for any < 1/8. By utilizing coding theory, we can bypass the sparsity condition to prove Proposition 3. Essentially, property P consists of neighborhoods around functions that have degree n/2 − 1 as polynomials over F2n . Its complement contains functions that are polynomials of degree n/2. Since d evaluations are needed to specify a polynomial of degree d, any tester for P or P needs (roughly) at least n/2 queries. Using a standard argument involving code concatenation, one can construct P and P to be binary properties that √ require testers of query complexity Ω( n). A formal proof can be found in Appendix C. 3.4 Difference Let P1 and P2 be two properties and let P = P1 \ P2 denote the set difference of the two properties. In this section, we confine our attention to the simple case that P2 ⊂ P1 . Since complementation is a special case of set-difference, from Section 3.3, we know that in general we can infer nothing about the testability of P from the fact that both P1 and P2 are testable. However, under certain restrictions, we still can show that P is testable. First we observe a simple case in which P1 \ P2 is testable. This simple observation, which is obvious and whose proof we omit, is utilized in the proof of Theorem 4 in Section 4.3. Observation 2. Let P2 ⊂ P1 be two testable properties defined over the same domain D = {Dn }n>0 . If for every f ∈ P2 , there is some g ∈ P1 \ P2 such that dist(f, g) = o(1), then P1 \ P2 is testable by the same tester which tests property P1 . Our second observation on set difference relies on the notion of tolerant testing, introduced by Parnas, Ron, and Rubinfeld [30] to investigate testers that are guaranteed to accept (with high confidence) not only inputs that satisfy the property, but also inputs that are sufficiently close to satisfying it. Definition 2 (Tolerant Tester [30]). Let 0 < 1 < 2 < 1 denote two distance parameters and P ⊆ RD be a property defined over the domain D = {Dn }n>0 . We say that property P is (1 , 2 )-tolerantly testable with query complexity q(1 , 2 ) if there is a tester T that makes at most q(1 , 2 ) queries, if for all f with dist(f, P) ≤ 1 , T rejects f with probability at most 1/3, and for all f with dist(f, P) ≥ 2 , T accepts f with probability at most 1/3. We record in the following proposition that if P and P2 are sufficiently far apart and P2 is tolerantly testable, then P is also testable. We include a proof in Appendix D. 6 Proposition 4. Let 1 < 2 < 0 be three absolute constants. Let P2 ⊂ P1 ⊆ RD be two properties defined over the same domain D = {Dn }n>0 . If for every > 0, P1 is testable with query complexity q1 (), P2 is (1 , 2 )-tolerantly testable with query complexity q2 (1 , 2 ), and dist(P1 \ P2 , P2 ) ≥ 0 , then P1 \ P2 is testable with query complexity O(q1 + q2 ) (and completeness 2/3). We note that since P2 is tolerantly testable, it does not have completeness 1. Thus, the set difference P1 \ P2 is not guaranteed to have one-sided error, either. 4 Main results In this section we show two applications of the results developed in Section 3. We stress that set theoretic arguments may be used to show both upper bound results (some properties are testable with only a few number of queries) and lower bound results (some properties can not be tested by any tester with less than certain number of queries). 4.1 Testing disjunction of linear functions Recall that P(100)-FREE is the set of Boolean functions that are free of (1, 0, 0)-patterns for any x, y and x + y in Fn2 . P(100)-FREE was shown to be testable with a tower-type query upper bound in [15]. In this section we employ a BLR-style analysis to show the property P(100)-FREE is testable with much less query complexity. Observe that the property P(110)-FREE is also testable with quadratic query complexity follows an analogous analysis. We first give a simple characterization of a Boolean function being (1,0,0)-free. Proposition 5 ([15]). A function f : Fn2 → {0, 1} is (1,0,0)-free if and only if f is the disjunction (OR) of linear functions (or the all 1 function). Proof. Let S = {x ∈ Fn2 : f (x) = 0}. If S is empty, then f is the all 1 function. Otherwise let x and y be any two elements in S (not necessarily distinct). Then if f is (1, 0, 0)-free, it must be the case that x + y is also in S. Thus S is a linear subspace of Fn2 . Suppose the dimension of S is k with V k ≥V1. Then there are k linearly independent vectors a1 , . . . , ak ∈ Fn2 such that z ∈ S iff {z · a = 0} · · · {z · ak = 0}. 1 W W Therefore, by De Morgan’s law, f (z) = 1 iff z ∈ S̄ iff {z · a1 = 1} · · · {z · ak = 1}, which is equivalent to the claim. Theorem 1. For every distance parameter > 0, the property P(100)-FREE is testable with query complexity O(1/2 ). Proof. Suppose we have oracle access to some Boolean function f : Fn2 → {0, 1}. A natural 3-query test T for P(100)-FREE proceeds as follows. T picks x and y independently and uniformly at random from Fn2 , and accepts iff (f (x), f (y), f (x + y)) 6= (1, 0, 0). def Let R = Prx,y [(f (x), f (y), f (x + y)) 6= (1, 0, 0)] be the rejection probability of T . If f ∈ P(100)-FREE , then R = 0, i.e., T has completeness 1. For soundness, in a series of steps, we shall show that for every > 0, if R < 2 /128, then there exists a Boolean function g such that (1) g is well-defined, (2) dist(f, g) < , and (3) g is in P(100)-FREE . Let µ0 denote Prx [f (x) = 0]. Suppose µ0 < 63/64. Then dist(f, ~1) < 63/64, where ~1 is the all-ones function. Then trivially, taking g = ~1 completes the proof. Thus, henceforth we assume that µ0 ≥ 63/64. 7 For a fixed x ∈ Fn2 , let px00 denote Pry [(f (y), f (x + y)) = (0, 0)], and and px10 is defined similarly. We define g : Fn2 → {0, 1} as follows: if px00 ≥ /4; 0, g(x) = 1, if px10 ≥ /4; f (x), otherwise. Proof of (1). g is well-defined. Suppose not, then there exists some x ∈ Fn2 such that px00 , px10 ≥ /4. Pick y and z independently and uniformly at random from Fn2 . Let E be the event that at least one of (f (y), f (z), f (y + z)) and (f (x + y), f (x + z), f (y + z)) is (1, 0, 0). By assumption, with probability at least 2 /16, f (y) = 1, f (x + y) = 0 and f (z), f (x + z) = 0, which will imply that – regardless of the value of f (y + z) – event E must occur. Thus, 2 /16 ≤ Pr[E]. On the other hand, by the union bound, Pr[E] ≤ 2R < 2 /64, a contradiction. Proof of (2). dist(f, g) < 32 . Suppose x is such that f (x) 6= g(x). By construction, Pry [f (x), f (y), f (x + y)] ≥ /4. This implies that the rejection probability R is at least dist(f, g) · /4. Since R < 2 /128, dist(f, g) < /32. Before proving (3), we first note that for every x ∈ Fn2 , Pr[(g(x), g(y), g(x + y)) = (1, 0, 0)] < y 5 . 16 To see this, note that by construction of g, for every x ∈ Fn2 , Pry [(g(x), f (y), f (x + y)) = (1, 0, 0)] < /4. Since dist(f, g) < /32, by the union bound, we can deduce that the probability that g has a (1, 0, 0)-pattern at (x, y, x + y) is less than /4 + 2 · /32. Proof of (3). g is in P(100)-FREE . Suppose not, that there exist x, y ∈ Fn2 such that g(x) = 1, g(y), g(x + y) = 0. Pick z uniformly at random from Fn2 . Let E denote the event that at least one of (g(x), g(z), g(x + z)), (g(y), g(z), g(y + z)), and (g(x + y), g(x + z), g(y + z)) is (1, 0, 0). A case by case analysis reveals that if g(z) = 0, then event E must occur. Note that the probability that g(z) = 0 is at least 63/64 − /32 = 61/64, since f (z) = 0 occurs with probability at least 63/64 and dist(f, g) < /32. On the other hand, by union bound, we have Pr[g(z) = 0] ≤ Pr[E] ≤ 3·5/16, implying that 61/64 ≤ 15/16, an absurdity. Therefore, we have shown that on any input function that is -far from P(100)-FREE , the rejection probability of T is always at least 2 /128. By repeating the basic test T independently O(1/2 ) times, we can boost the rejection probability of T to 2/3, and thus completing the proof. 8 4.2 A new proof that linearity is testable As an application of our results in Section 3.2, we give a new proof that linear functions are testable based on a set-theoretic argument. To this end, note that the set of linear functions equals to the intersection of (1, 1, 1)-free functions and (1, 0, 0)-free functions, i.e., PLIN = P(111)-FREE ∩ P(100)-FREE . From the previous section, we know that P(100)-FREE is testable. The following theorem due to Green [24] asserts that P(111)-FREE is also testable. Theorem 2 ([24]). The property P(111)-FREE is testable with query complexity W (poly(1/)), where for every t > 0, W (t) denotes the tower of 2’s of height dte. By Proposition 2, to show that linearity is testable, it suffices to show that the two properties P(111)-FREE and P(100)-FREE are essentially far apart. To this end, let us define a new property PNLTF , where NLTF stands for non-linear triangle-freeness: def PNLTF = P(111)-FREE \ PLIN . Lemma 1. We have that P(100)-FREE \ PLIN is 41 -far from PNLTF . We first establish a weaker version of Lemma 1. Proposition 6. Suppose f is a disjunction of exactly two non-trivial linear functions. Then dist(f, P(111)-FREE ) is at least 41 . W Proof. Set N = 2n . Write f (x) = (α · x) (β · x), where α 6= β ∈ Fn2 denote two n-bit vectors not equal to 0n . We say that a tuple (x, y, x + y) where x, y ∈ Fn2 is a triangle in f if f (x), f (y), f (x + y) = 1. We shall show that (1) f has N 2 /16 triangles and (2) for every x, the number of y 0 s such that (x, y, x + y) is a triangle in f is N/4. Together, (1) and (2) will imply that dist(f, P(111)-FREE ) is at least 1/4, since changing the value of f at one point removes at most N/4 triangles. To prove these two assertions, let A = {x ∈ Fn2 : α · x = 1} and B = {x ∈ Fn2 : β · x = 1}. Since supp(f ) = A ∪ B, for every triangle (x, y, x + y) in f , each of the three points x, y, x + y must fall in one of the following three disjoint sets: A \ B, (A ∩ B), B \ A. Furthermore, each of the three points must fall into distinct sets. To see this, suppose that x, y ∈ A \ B. Then by definition, α(x + y) = α(x) + α(y) = 0 and β(x + y) = 0, implying that f (x + y) = 0, a contradiction. So A \ B cannot contain two points of a triangle, and by symmetry, neither can B \ A. The same calculation also reveals that A ∩ B cannot contain two points of a triangle. Thus, a triangle (x, y, x + y) in f must be such that x ∈ A \ B, y ∈ A ∩ B, and x + y ∈ B \ A. In addition, it is easy to check that given two points p1 , p2 from two distinct sets (say A \ B and A ∩ B), their sum p1 + p2 must be in the third set (B \ A). Since these three sets A \ B, (A ∩ B), B \ A all have size N/4, this implies that the number of triangles in f is N 2 /16, proving (1). (2) also follows easily given the above observations. Suppose x ∈ A \ B. For every y ∈ A ∩ B, (x, y, x + y) forms a triangle. Since any triangle that has x as a point must also contain a point in A ∩ B (with the third point uniquely determined by the first two), the number of triangles in f containing x is N/4. The case when x ∈ B \ A or x ∈ A ∩ B is similar. This completes the proof. 9 Now we prove Lemma 1. Proof of Lemma 1. Let f ∈ P(100)-FREE \ PLIN and write f = f1 ∨ f2 , where f1 is a disjunction of exactly two linear functions. By Proposition 6, it follows that dist(f1 , P(111)-FREE ) is at least 1/4. Since P(111)-FREE is monotone and supp(f1 ) ⊆ supp(f ), by Observation 1, we know that dist(f, P(111)-FREE ) ≥ 1/4. Since PNLTF ⊂ P(111)-FREE , dist(f, PNLTF ) ≥ dist(f, P(111)-FREE ), completing the proof. By Theorem 2 and Theorem 1, both P(111)-FREE and P(100)-FREE are testable. Now by combining Proposition 2 and Lemma 1, we obtain the following: Theorem 3. PLIN is testable. We remark that the query complexity for testing linearity in Theorem 3 is of the tower type (of the form W (poly(1/)) because of Theorem 2. This is much worse than the optimal linear query upper bound obtained in [18, 10]. 4.3 A lower bound for testing non-linear triangle-freeness We first show that PLIN is a “thin strip” around PNLTF . Proposition 7. For any Boolean function f , dist(f, P(111)-FREE ) ≥ dist(f, PNLTF ) − 2−n . Proof. The statement is trivially true if dist(f, P(111)-FREE ) = dist(f, PNLTF ). Since PNLTF is a proper subset of P(111)-FREE , we can assume that dist(f, PNLTF ) is strictly larger than dist(f, P(111)-FREE ), implying that the function in P(111)-FREE that has minimum distance to f is actually in PLIN . Call this function g. Then it is easy to see that there exists some function h in PNLTF such that dist(g, h) = 2−n . To this end, note that if g is the all-zero function, we can define h such that h(x) = 1 for some x 6= 0n and 0 everywhere else. By construction h is non-linear but triangle-free. If g is a non-trivial linear function, then we can pick any x ∈ supp(g) and define h(x) = 0 and h(y) = g(y) for all y 6= x. By construction h is non-linear, and since P(111)-FREE is monotone, h remains triangle-free. Thus, by Triangle inequality, we know that dist(f, P(111)-FREE ) = dist(f, g) is at least dist(f, h) − 2−n . This implies that dist(f, P(111)-FREE ) ≥ dist(f, PNLTF ) − 2−n . Since any linear function is 2−n -close to a function in PNLTF , intuitively we expect PNLTF , which is obtained by deleting the strip PLIN from P(111)-FREE , to inherit the testability features of P(111)-FREE . Indeed, we record this next by using the set-theoretic machinery set up in Section 3. Theorem 4. PNLTF is testable, but any non-adaptive ω(1/) queries. 6 tester (with one-sided error) for PNLTF requires Proof. We first observe that PNLTF is testable with one-sided error. By Proposition 7 and Observation 2, the testing algorithm for PNLTF is simply the same as the tester for P(111)-FREE [24]. Next we show that the lower bound for the query complexity of PNLTF is the same as P(111)-FREE . As shown in [17], any one-sided, non-adaptive tester for P(111)-FREE requires ω(1/) queries.7 Suppose PNLTF is testable with one-sided error and has query complexity O(1/). Since PLIN is testable with query complexity O(1/) [18], by Proposition 1 P(111)-FREE = PLIN ∪ PNLTF is testable with one-sided error and has query complexity O(1/), a contradiction. 6 A tester is non-adaptive if all its query points can be determined before the execution of the algorithm, i.e., the locations where a tester queries do not depend on the answers to previous queries. 7 The specific lower bound shown in [17] is Ω(( 1 )1.704··· ) but can be improved to be Ω(( 1 )2.423··· ) as observed independently by Eli Ben-Sasson and the third author of the present paper. 10 5 Concluding remarks We have initiated a general study of the closure of testability under various set operations. Our results show that such a study can lead to both upper and lower bound results in property testing. We believe our answers are far from complete, and further investigation may lead to more interesting results. For example, the def symmetric difference between two properties P1 and P2 is defined to be P1 4 P2 = (P1 \ P2 ) ∪ (P2 \ P1 ). Under what conditions is the property P1 4 P2 testable if both P1 and P2 are testable? Another natural generalization of our approach is to examine properties resulting from a finitely many application of some set-theoretic operations. Our proof that the class of disjunction of linear functions is testable employs a BLR-style self-correction approach. We believe that this technique may be useful in analyzing other non-monotone, pattern-free properties. In particular, it will be interesting to carry out our approach of decomposing an algebraic property into atomic ones for higher degree polynomials. This will, in addition to giving a set-theoretic proof for testing low-degree polynomials, sheds light on how pattern-free properties relate to one another. Finally, our quadratic query complexity upper bound for the disjunction of linear functions opens up a number of directions. In our work, the blowup in query complexity from the disjunction is O(1/). One may vary the underlying properties and the operators to measure the blowup in query complexity. Of particular interest may be understanding how the disjunction affects the testability of low-degree polynomials. Acknowledgments We thank the anonymous referees for numerous suggestions and the reference to [23]. References [1] Noga Alon. Testing subgraphs in large graphs. Random Structures and Algorithms, 21(3-4):359–370, 2002. [2] Noga Alon, Eldar Fischer, Michael Krivelevich, and Mario Szegedy. Efficient testing of large graphs. Combinatorica, 20(6):451–476, 2000. [3] Noga Alon, Eldar Fischer, Ilan Newman, and Asaf Shapira. A combinatorial characterization of the testable graph properties: it’s all about regularity. In STOC’06: Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pages 251–260, 2006. [4] Noga Alon, Tali Kaufman, Michael Krivelevich, Simon Litsyn, and Dana Ron. Testing low-degree polynomials over GF(2). In Proceedings of Random 2003, pages 188–199, 2003. [5] Noga Alon, Michael Krivelevich, Ilan Newman, and Mario Szegedy. Regular languages are testable with a constant number of queries. SIAM Journal on Computing, 30(6):1842–1862, 2000. [6] Noga Alon and Asaf Shapira. Testing subgraphs in directed graphs. Journal of Computer and System Sciences, 69(3):354–382, 2004. [7] Noga Alon and Asaf Shapira. A characterization of the (natural) graph properties testable with onesided error. In FOCS’05: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, pages 429–438, 2005. 11 [8] Noga Alon and Asaf Shapira. Every monotone graph property is testable. In STOC’05: Proceedings of the 37th Annual ACM Symposium on Theory of Computing, pages 128–137, 2005. [9] Tim Austin and Terence Tao. On the testability and repair of hereditary hypergraph properties. http: //arxiv.org/abs/0801.2179, 2008. [10] Mihir Bellare, Don Coppersmith, Johan Håstad, Marcos A. Kiwi, and Madhu Sudan. Linearity testing over characteristic two. IEEE Transactions on Information Theory, 42(6):1781–1795, 1996. [11] Mihir Bellare, Oded Goldreich, and Madhu Sudan. Free bits, PCPs, and nonapproximability—towards tight results. SIAM Journal on Computing, 27(3):804–915, 1998. [12] Mihir Bellare, Shafi Goldwasser, Carsten Lund, and Alexander Russell. Efficient probabilistically checkable proofs and applications to approximation. In STOC’93: Proceedings of the 25th Annual ACM Symposium on Theory of Computing, pages 304–294, 1993. [13] Eli Ben-Sasson, Prahladh Harsha, and Sofya Raskhodnikova. Some 3CNF properties are hard to test. SIAM Journal on Computing, 35(1):1–21, 2005. Early version in STOC’03. [14] Itai Benjamini, Oded Schramm, and Asaf Shapira. Every minor-closed property of sparse graphs is testable. In STOC’08: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, pages 393–402, 2008. [15] Arnab Bhattacharyya, Victor Chen, Madhu Sudan, and Ning Xie. Testing linear-invariant non-linear properties. In STACS’09, pages 135–146, 2009. [16] Arnab Bhattacharyya, Elena Grigorescu, and Asaf Shapira. A unified framework for testing linearinvariant properties. In FOCS’10: Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science, 2010. [17] Arnab Bhattacharyya and Ning Xie. Lower bounds for testing triangle-freeness in Boolean functions. In SODA’10: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, pages 87–98, 2010. [18] Manuel Blum, Michael Luby, and Ronitt Rubinfeld. Self-testing/correcting with applications to numerical problems. Journal of Computer and System Sciences, 47(3):549–595, 1993. [19] Christian Borgs, Jennifer T. Chayes, László Lovász, Vera T. Sós, Balázs Szegedy, and Katalin Vesztergombi. Graph limits and parameter testing. In STOC’06: Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pages 261–270, 2006. [20] Ilias Diakonikolas, Homin K. Lee, Kevin Matulef, Krzysztof Onak, Ronitt Rubinfeld, Rocco Servedio, and Andrew Wan. Testing for concise representations. In FOCS’07: Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, pages 549–558, 2007. [21] Eldar Fischer and Ilan Newman. Testing versus estimation of graph properties. SIAM Journal on Computing, 37(2):482–501, 2007. [22] Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM, 45(4):653–750, 1998. 12 [23] Parikshit Gopalan, Ryan O’Donnell, Rocco A. Servedio, Amir Shpilka, and Karl Wimmer. Testing Fourier dimensionality and sparsity. In ICALP (1), pages 500–512, 2009. [24] Ben Green. A Szemerédi-type regularity lemma in abelian groups, with applications. Geom. Funct. Anal., 15(2):340–376, 2005. [25] Charanjit S. Jutla, Anindya C. Patthak, Atri Rudra, and David Zuckerman. Testing low-degree polynomials over prime fields. In FOCS’04: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, pages 423–432, 2004. [26] Tali Kaufman and Dana Ron. Testing polynomials over general fields. In FOCS’04: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, pages 413–422, 2004. [27] Tali Kaufman and Madhu Sudan. Algebraic property testing: The role of invariance. In STOC’08: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, pages 403–412, 2008. [28] Marcos Kiwi. Algebraic testing and weight distributions of codes. Theoretical Computer Science, 299(1-3):81–106, 2003. Earlier version appeared as ECCC TR97-010, 1997. [29] Dan Král, Oriol Serra, and Lluis Vena. A removal lemma for systems of linear equations over finite fields, 2008. [30] Michal Parnas, Dana Ron, and Ronitt Rubinfeld. Tolerant property testing and distance approximation. Journal of Computer and System Sciences, 72(6):1012–1042, 2006. [31] Michal Parnas, Dana Ron, and Alex Samorodnitsky. Testing basic Boolean formulae. SIAM Journal on Discrete Mathematics, 16(1):20–46, 2003. [32] Vojtěch Rödl and Mathias Schacht. Generalizations of the removal lemma. Combinatorica, To appear. Earlier version in STOC’07. [33] Ronitt Rubinfeld and Madhu Sudan. Robust characterizations of polynomials with applications to program testing. SIAM Journal on Computing, 25(2):252–271, 1996. [34] Asaf Shapira. Green’s conjecture and testing linear-invariant properties. In STOC’09: Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pages 159–166, 2009. A Proof of Proposition 1 Let T1 be the tester for P1 with query complexity q1 (, n) and T2 be the tester for P2 with query complexity q2 (, n). We may assume that both T1 and T2 have soundness 1/6 with a constant blowup in their query complexity. Define T to be the tester which, on input function f , first simulates T1 and then T2 . If at least one of the two testers T1 and T2 accepts f , T accepts f . Otherwise, T rejects. Clearly the query complexity of T is O(q1 + q2 ). For completeness, note that if f is in P, then by definition f is in at least one of P1 and P2 . Thus, T accepts f with probability 1. Now suppose dist(f, P) ≥ . Then we have both dist(f, P1 ) ≥ and dist(f, P2 ) ≥ . By the union bound, the probability that at least one of T1 and T2 accepts f is at most 1/6 + 1/6 = 1/3. 13 B Proof of Proposition 2 Let T1 be the tester for P1 with query complexity q1 (), and T2 be the tester for P2 with query complexities q2 (). First we convert T1 into another tester T10 for P1 such that, on input distance parameter , T10 makes Q01 () queries, where ( q1 (x), if x < 20 ; Q01 (x) = max{q1 (x), qi ( 20 )}, otherwise. In other words, T10 can be obtained from T1 by making more queries when x is larger than 0 /2. Similarly, we can construct T20 from T2 in the same number. Since 0 is a constant, we have Q01 () = O(q1 ()) and Q02 () = O(q2 ()). Define T to be the tester that on input function f , first simulates T10 and then simulates T20 . If both testers 0 T1 and T20 accept, then T accepts f . Otherwise, it rejects. The query complexity of T is Q01 () + Q02 (), which is O(q1 () + q2 ()). For the completeness, if f ∈ P, then both f ∈ P1 and f ∈ P2 hold. Therefore, T accepts with probability at least 1. For the soundness, suppose dist(f, P) ≥ . We distinguish between two cases. Case 1. ≤ 20 . It suffices to show that f is -far from at least one of P1 or P2 . This fact then implies that T , in simulating T 0 and T20 , accepts f with probability at most 1/3. To show the f is far from at least one of the two properties, suppose not, that we have both dist(f, P1 ) < and dist(f, P2 ) < . That is, there exist g1 ∈ P1 and g2 ∈ P2 such that dist(f, g1 ) < and dist(f, g2 ) < . Since dist(f, P) ≥ , g1 , g2 ∈ / P and therefore g1 ∈ P1 \ P and g2 ∈ P2 \ P. By triangle inequality, dist(g1 , g2 ) < 2 ≤ 0 , and consequently dist(P1 \ P2 , P2 \ P1 ) < 0 , contradicting our assumption. Case 2. > 20 . There are three sub-cases depending on where f is located. We analyze each of them separately below. Note that in each of the sub-cases, f is at least 0 /2-far from one of P1 and P2 . 1. f ∈ P1 \P. Then by our assumption on the distance between P1 \P2 and P2 \P1 , dist(f, P2 \P) ≥ 0 . It follows that dist(f, P2 ) = min{dist(f, P), dist(f, P2 \ P)} ≥ min{, 0 } ≥ 0 /2. 2. f ∈ P2 \ P. Analogous to the case above, we have dist(f, Po ) ≥ 0 /2. 3. f ∈ / P1 ∪ P2 . Then by triangle inequality, max{dist(f, P1 \ P), dist(f, P2 \ P)} ≥ 0 /2. Then there is some i ∈ {1, 2} such that dist(f, Pi \ P) ≥ 0 /2. Since dist(f, P) ≥ , it follows that dist(f, Pi ) ≥ min{, 0 /2} = 0 /2. Thus, we conclude that there is some i ∈ {1, 2} such that dist(f, Pi ) ≥ 0 /2. This implies that Ti0 , which makes at least Q0i () ≥ qi (0 /2) queries, accepts f with probability at most 1/3. Hence, T accepts f with probability at most 1/3 as well, completing the proof. 14 C Proof of Proposition 3 2k We shall define a property P = {P2k }k>0 , where P2k ⊆ {0, 1}F2 is a collection of Boolean functions defined over F2k 2 , such that neither P2k nor P2k is testable. Recall that a property P is said to be testable if there is a tester for P whose query complexity is independent of the sizes of the inputs to the functions (in our case, independent of k). First, let the Hadamard encoding Had : Fk2 × Fk2 → {0, 1} be Had(α, x) = α · x. Note that F2k is isomorphic to Fk2 , so for every function g : F2k → F2k , the Hadamard concatenation of g can be written as def Had ◦ g : F2k 2 → {0, 1} where (Had ◦ g)(x, y) = Had(g(x), y). We now define P2k as follows. Let f ∈ P2k if there exists a polynomial p : F2k → F2k of degree at most 2k−1 − 1 such that dist(f, Had ◦ p) < 1/8. An important fact is that if g : F2k → F2k is a polynomial of degree 2k−1 , then Had ◦ g is not in P2k . To see this, note that by the Schwartz-Zippel Lemma, if q : F2k → F2k is a polynomial of degree at most 2k−1 , then Prx [q(x) = 0] ≤ 1/2. Therefore, for any polynomial p of degree at most 2k−1 − 1, dist(p, g) ≥ 1/2. This implies that dist(Had ◦ p, Had ◦ g) ≥ 1/4, since the Hadamard encoding has relative distance 1/2.8 Since the Hadamard encoding of g is at least 1/4far from the Hadamard encoding of any degree 2k−1 − 1 polynomials, by construction of P2k , Had ◦ g is at least 1/8-far from P , i.e., Had ◦ g ∈ P2k . Now we show that neither P2k nor its complement is testable for any distance parameter < 1/8. By polynomial interpolation, for every set of 2k−1 − 1 points, there exists a polynomial of degree 2k−1 − 1 that agrees with g on these points. So any tester that distinguishes between members of P2k and members at least -far away from P2k needs at least 2k−1 − 1 queries. Similarly, as we have just shown that Had ◦ g ∈ P2k when g is a degree-2k−1 polynomial, it follows that any tester that distinguishes between members of P2k and functions at least -far away from P2k also need at least 2k−1 queries. To conclude, we have shown a property P defined over domains of sizes |D| = 22k but testing P or P both require Ω(2k ) = Ω(|D|1/2 ) queries. Thus, neither the property or its complement is testable with a query complexity independent of the sizes of the domains, completing the proof. D Proof of Proposition 4 Let T1 be the tester for P1 with query complexity q1 () and let T2 be the tolerant tester for P2 with query complexity q2 (1 , 2 ). First we convert T1 into another tester T10 such that, on input distance parameter , T10 makes Q01 () queries, where ( q1 (x), if x < 1 ; Q01 (x) = max{q1 (x), qi (1 )}, otherwise. Set P = P1 \ P2 and define its tester T as follows: on input function f , T first simulates T1 and then T2 . T accepts iff T1 accepts and T2 rejects. Since 1 is a constant, Q01 ()) = O(), and T has query complexity O(q1 + q2 ). For completeness, if f ∈ P, then by assumption f ∈ P1 and dist(f, P2 ) ≥ 0 > 2 . This implies that T1 always rejects f , T2 accepts f with probability at most 1/3, and thus by a union bound argument T accepts f with probability at least 2/3. 8 In other words, suppose x ∈ F2k satisfies that p(x) 6= g(x). Then the number of y’s such that Had(p(x), y) 6= Had(g(x), y) is exactly 2k−1 . 15 For soundness, suppose dist(f, P) ≥ . We consider two cases and note that in both of them, T accepts f with probability at most 1/3. Case 1. dist(f, P2 ) ≤ 1 . Since T2 is a tolerant tester, T2 rejects f with probability at most 1/3. Thus, T accepts with probability at most 1/3 as well. Case 2. dist(f, P2 ) > 1 . Since P1 is the union of P and P2 , we can conclude that dist(f, P1 ) = min{dist(f, P), dist(f, P2 )}, which is at least min{, 1 }. Since T10 makes at least max{q1 (), q1 (1 )} queries, we know that T10 accepts f with probability at most 1/3, and hence, T accepts f with probability at most 1/3 as well. 16