Property Testing

advertisement
A Tutorial on
Property Testing
Dana Ron
Tel Aviv University
Property Testing (Informal Definition)
For a fixed property P and any object O,
determine whether O has property P,
or whether O is far from having property P
(i.e., far from any other object having P ).
?
?
?
?
?
Task should be performed by querying the object (in
as few places as possible).
Examples
• The object can be a graph (represented by its adjacency
matrix), and the property can be 3-colorabilty.
• The object can be a string and the property can be
membership in a given regular language L.
• The object can be a function and the property can be
linearity.
Context
Property testing can be viewed as:
• A relaxation of exactly deciding whether the object has
the property.
• A relaxation of learning the object.
In either case want testing algorithm to be significantly
more efficient than decision/learning algorithm.
When can Property Testing be Useful?
• Object is to too large to even fully scan, so must make
approximate decision.
• Object is not too large but
(1) Exact decision is NP-hard (e.g. coloring)
(2) Prefer sub-linear approximate algorithm to
polynomial exact algorithm.
• Use Testing as preliminary step to exact decision or
learning. In first case can quickly rule out object far
from property. In second case can aid in efficiently
selecting good hypothesis class.
Property Testing - Background
• Initially defined by Rubinfeld and Sudan in the
context of Program Testing (of algebraic functions).
• Goldreich Goldwasser and Ron initiated study of
testing properties of graphs.
• Growing body of work deals with properties of
functions, graphs, strings, sets of points ...
Many algorithms with complexity that is sub-linear in
(or even independent of) size of object.
Talk Organization
Will discuss four topics:
• Testing Algebraic Properties of Functions:
Linearity Testing [BLR]
• Testing “Basic” (non-algebraic) Properties of Functions:
Singletons, Monomials, small DNF [PRS]
• Testing Graph Properties:
Testing Bipartiteness [GGR]
• Testing Properties of strings:
Testing Membership in Regular Languages [AKNS]
Testing Algebraic Properties of
Functions: Linearity Testing [BLR]
Linearity Testing
Def1: Let F be a finite field. A function f : Fm  F is
called linear (multi-linear) if there exists constants
a1,…,am  F s.t. for every x=x1,…,xm  Fm it holds that
f(x) =  aixi .
Def2: A function f is said to be -far from linear if for
every linear function g, dist(f,g)>, where
dist(f,g)=Pr[f(x)  g(x)] (x selected uniformly in Fm).
Fact: A function f : Fm  F is linear i.f.f for every x,y  Fm
it holds that f(x)+f(y)=f(x+y) .
Linearity Testing Cont’
Linearity Test (Input: F, m, )
1) Uniformly and independently select (1/) pairs of
elements x,y  Fm .
2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y).
3) If for any of the pairs selected linearity is violated (i.e.,
f(x)+f(y)  f(x+y)), then REJECT, otherwise ACCEPT.
Observe: If f is linear then tests accepts w.p. 1.
Theorem: If f is -far from linear then with probability
at least 2/3 the test rejects it.
Linearity Testing Cont’
Proof (of special case): Let (f) denote distance of f to
closest linear function g. Assume 1/2 - (f) is constant.
Let G={x: f(x)=g(x)} (so that Pr[xG]= (f)>).
Say that x and y are a violating pair if f(x)+f(y)  f(x+y).
Observation: for any x, y, if among the 3 elements,
x, y, x+y we have 2 in G and 1 not in G, then x,y are a
violating pair.
Consider one of the 3 (disjoint) events. Can show:
Pr[xG , yG , (x+y) G ]  (f)  (1 - 2 (f) ).
Since events are disjoint, prob of violating pair is at least
3(f)  (1 - 2 (f) ) = 6 (f)  (1/2- (f) ) = ().
Since test takes (1/) pairs x,y, will reject w.h.p. 
Linearity Testing Cont’
How do we deal with the general case (where (f) not
necessarily bounded away from 1/2)?
In order to prove that if (f)> then reject w.p.  2/3 ,
prove contrapositive: if accept w.p > 1/3 (i.e., small
fraction of violating pairs) then f is -close to linear. That
is, exists linear g s.t. dist(f,g)  .
Specifically, define g as follows:
g(x) = 1
if Pry[f(x+y)-f(y)=1]  1/2
g(x) = 0
if Pry[f(x+y)-f(y)=0] > 1/2
Can prove that if fraction of violating pairs (w.r.t. f) is
sufficiently small the f is close to g and g is linear.
Note: definition of g allows for Self-Correcting of f (for every x
can determine g(x) w.h.p by few queries to f).
Testing “Basic” Properties of Functions:
Singletons, Monomials, small DNF [PRS]
Testing “Basic” Properties of Functions:
This work considers “The most basic” function classes:
• Singletons: f ( x)  xi
f ( x)  xi  x j  xk
f ( x)  ( xi  x j )  ( xk  x  xm )
• Monomials:
• DNF:
Testing “Basic” Properties of Functions Cont’
• Can test whether f is a singleton using O (1 /  ) queries.
• Can test whether f is a monomial using O (1 /  ) queries.
• Can test whether f is a monotone DNF with at most t
~
terms using O (t 2 /  ) queries.
Common theme: no dependence in query complexity on
size of input, n, and polynomial dependence on distance
parameter, .
Learning Boolean Formulae
Basic observation: (proper) learning implies testing.
F
F
f

f
h

h
• Can learn singletons and monomials under uniform
distribution using O (log n /  ) queries [BEHW].
• Can properly learn monotone DNF with t terms and r literals
~
using O ( r  log 2 n /   t (r  1 /  )) queries [A+BJT].
Main difference w.r.t testing results: no dependence on n and
different algorithmic approach.
Testing (Monotone) Singletons
Singletons satisfy: (1)
(2)
Pr[ f ( x)  1]  1 / 2
f ( x  y)  f ( x)  f ( y) x, y
Natural test: check, by sampling, that conditions hold
(approximately).
Can analyze natural test for case that distance between
function and class of singletons is not too big (bounded
from 1/2).
Testing Singletons II - Parity Testing
Observation: Singletons are a special case of parity functions
(i.e., functions of the form g ( x)   xi .)
iS
Claim: Let
g ( x)   xi . If | S | 2
iS
then
Pr[ g ( x  y)  g ( x)  g ( y )]  1 / 4
Modified algorithm:
(1) Test whether f is a parity function (with dist. par. ) using
algorithm of [BLR] .
(2) Uniformly select constant number of pairs x,y and check
whether any is a violating pair (i.e.: f ( x  y )  f ( x)  f ( y ) ).
Testing Singletons III - Self Correcting
This “almost works”: If f is singleton - always accepted.
If f is -far from parity - rejected w.h.p.
But if f is -close to parity function g, then cannot simply
apply claim to argue that many violating pairs w.r.t. f.
If we could only test violations w.r.t. g instead of f ...
Use Self-Corrector of [BLR] to “fix” f into parity function (g),
and then test violations on self-corrected version.
Testing Singletons IIII - The Algorithm
Final Algorithm for Testing Singletons:
(1) Test whether f is a parity function with dist. par.  using
algorithm of [BLR] .
(2) Uniformly select constant number of pairs x,y. Verify that
Self-Cor(f,x)  Self-Cor(f,y) = Self-Cor(f,xy) .

(3) Verify that Self-Cor( 1 ) = 1 .
Testing Monomials and Monotone DNF
Monomial testing algorithm has similar structure to Singleton
testing algorithm. (Here too suffice to find test for monotone
monomials.)
The first stage of linearity testing is replaced by Affinity Testing:
if f is a monomial then F1={x: f(x)=1} is an affine subspace.
[Fact: H is affine subspace i.f.f x,y,zH, xyz H].
Affinity test is similar to parity test: select x,yF1, z{0,1}n,
verify that f(xyz)=f(x)f(y)f(z).
The second stage is as in singleton test (check for violating
pairs). Here affinity adds structure that helps analyze second
stage.
Testing monotone DNF: use monomial test as sub-routine (a
monotone DNF function is a disjunction of monotone monomials).
Testing Graph Properties [GGR]
Testing Graph Properties
Assume graphs are represented by their adjacency matrix.
In this model, testing algorithm can perform queries:
“is there an edge between u and v”.
Distance between graphs: fraction of entries in adjacency
matrix on which they differ.
This model most appropriate for testing dense graphs.
v
u
1
Results for Testing Graph Properties
In Adjacency-Matrix model
• Can test: Bipartiteness, k-colorability, r-Clique, r-Cut and a
more general family of partition problems, with sample
complexity poly(1/)and running time exp(poly(1/))both
independent of size of graph [GGR].
• Can test all properties that can be formulated by first order
expression  about graphs with sample and time complexity
independent of graph size (but at “steep” cost as function of
1/) [AFKS].
• In directed graphs can test acyclicity with sample and time
complexity poly(1/)[BR] (special case treated in [EKKRV]).
In Incidence-Lists model
Connectivity, k-edge-connectivity: complexity poly(1/)[GR1],
Bipartiteness: poly(1/)|V|1/2 [GR2], Diameter: poly(1/)[PR].
Testing Bipartiteness
Def: Graph G=(V,E) is bipartite i.f.f. can partition vertices
into two subsets V1 and V2 s.t. there are no edges
between vertices that are both in V1 or both in V2.
V1
V2
Recall that can decide whether graph is bipartite in time
O(|V|+|E|) by Breadth First Search (BFS). However, we
want very fast approximate decision.
Furthermore, can extend algorithm and analysis to testing
k-colorability (which is NP-Hard).
Testing Bipartiteness Cont’
Bipartite Testing Algorithm
• Uniformly and independently select
m=(log(1/)/2) vertices in graph.
G
• For every pair of vertices selected query whether
there is an edge between the two, obtaining
induced sub-graph.
• Perform a BFS to determine whether induced
subgraph is bipartite. If it is output accept, o.w.
output reject.
Query complexity and running time of algorithm:
O(log2(1/)/4) . Slight variant of alg yields O(log2(1/)/3)
and [AK] have reduced to O(log2(1/)/2) .
Correctness: If graph is bipartite then clearly always accepted.
From this point on assume graph is -far from bipartite.
Will show that rejected w.p. at least 2/3.
Analysis of Bipartiteness Testing Alg
Def: Let X be a subset of points,
and (X1,X2) a partition of X. Say
that an edge (u,v) is violating
w.r.t. (X1,X2) if either both u,v in
X1 or both in X2.
If there are no violating edges
w.r.t. (X1,X2) then say it is a
bipartite partition.
View sample as consisting of two
parts: U and S. Show that w.h.p.,
for every partition (U1,U2) of U
there is no partition (S1,S2) of S,
s.t. (U1S1,U2S2) is bipartite.
In other words, the sub-graph induces
by sample US is not bipartite.
X1
v
X2
u
X1
U1
U2
S
X2
U1
U2
S
Analysis of Bipartiteness Testing Alg Cont’
Def1: A vertex v is influential if has degree at least ( /4)|V|.
Def2: A vertex v is covered by subset U if has neighbor in U.
U
v
Lem: W.h.p. U covers all influential vertices but ( /4)|V|.
U
V
Influential
Uncovered influential
Non-influential
Analysis of Bipartiteness Testing Alg Cont’
Let C be vertices covered by U and let R be remaining vertices.
U
C
Non-influential
R
Uncovered influential
Observe: Since R contains at most all non-influential vertices,
and at most ( /4)|V| influential ones, total num of edges
incident to R is at most ( /2)|V|2.
Recall, graph G is -far from bipartite: every partition (V1,V2)
of V has >  |V|2 violating edges.
Together, above two imply that every partition of UC has
> ( /2)|V|2 violating edges.
Analysis of Bipartiteness Testing Alg Cont’
Consider fixed partition (U1,U2) of U , and let (C1,C2) be
partition of C where neighbors of vertices in U1 are put in C2
and neighbors of vertices in U2 are put in C1.
U1
U2
w
C1
v
C2
Since (U1C1,U2C2) contains
> ( /2)|V|2 violating edges, this
many pairs of vertices (v,w) in C1
(C2) have violating edge between
them.
If get such pair (v,w) in sample S, then for every partition
(S1,S2), partition (U1S1,U2S2) contains some violating edge.
Since many such pairs, the sample S contains such a pair
w.h.p. By union bound on number of partitions (U1,U2) (at
most 2|U|= exp(log(1/)/)) S contains such a pair for
every (U1,U2). 
Testing Other Graph (Partition) Properties
Each property (k-colorability, r-Clique, r-Cut ) has its own
“particularities” but in all cases:
• “Natural algorithm” (take small uniform sub-sample and
check induced subgraph for property) works.
• Analysis works by breaking sample into two parts: the first
part, U “forces” constraints on possible partitions of all
vertices. Second part, S, “tests” whether constraints are
satisfied.
More general results of [AFKS] (combination of partition and
forbidden subgraph properties ( properties)) also analyze
natural algorithm. Analysis builds on Szemerdi’s regularity
lemma.
Testing Properties of Strings:
Membership in Regular Languages
[AKNS]
Testing Membership in Regular Languages
For fixed regular language L  {0,1}*, testing algorithm should
accept w.h.p. every word wL, and should reject w.h.p. every
word w that differs on more than n bits (n=|w|) from every
w’L (|w’|=n). Algorithm can query any bit wi of w.
Let M=(Q,F,q0,) be the (minimum) DFA that accepts L. Let
G(M) denote directed graph induced by M (that is, there is a
directed edge for every transition).
Def: Let u=wi…wj be sub-word of w that starts at position i.
Say that u is feasible w.r.t. M starting from i if there exists a
state q s.t. q can be reached in G(M) from q0 in exactly i-1
steps, and there is a path of length (n-(|u|+i-1)) in G(M) from
q’= (q,u) to an accepting state qf.
q0
q
i-1 steps
q’
u
qf
n-(|u|+i-1) steps
Testing Regular Language Cont’
Consider special case:
• Unique accepting state qf ;
•Q
-
can be partitioned into two parts: C and D:
q0,qf  C ;
subgraph G(C) strongly connected;
no edges from D to C.
C
q0
D
q’
q
qf
- The GCD of cycle-lengths in G(C) is 1
 There exists a constant r (=O(|Q|2) s.t.  q,q’  C ,
 m r , exists path of length m from q to q’.
Testing Regular Language Cont’
The Algorithm (simplified version):
• Uniformly and independently select (r/) indices
1i n .
• For each i selected, check that the substring wi … wi+r/
is feasible.
• If any substring is infeasible then reject, otherwise accept.
Number of queries: O(r2/ 2)=poly(|Q|)/ 2 and running time
poly(|Q|)/ 2 (can improve to almost linear dependence on
1/ ).
Correctness: If wL, then always accept.
If w is -far from L , would like to show that w contains many
(short) infeasible substrings (causing rejection w.h.p).
Testing Regular Language Cont’
Prove contrapositive statement: If number of (short)
infeasible substrings in w is small then w is close to w*L
Proof idea: partition w (except first and last r symbols) into
disjoint maximal feasible substrings u1, … ,uh : each uj is
feasible, but addition of next symbol wk makes it infeasible.
C
qj
uj
qj+1
qj’
D
wk
uj+1
q’j+1
By slightly modifying each uj , can “glue” the modified
substrings together into one string w* that “does not leave C”,
and reaches qf. If h is small (as assumed), the w* close to w.
Testing Regular Language Cont’
General case works by reducing to special case we discussed.
In particular need to decompose G(M) into its strongly
connected components, and consider how a word “moves
between them”.
This work has been extended by Newman to testing Branching
Programs of bounded width, and by Kupferman and XX to
testing Tree Automata.
Directions for Further Research
 “Biggest” open problem: Can we characterize what
properties are efficiently testable? (e.g., find a measure
analogous to VC - dimension.)
 Find Families of properties that are efficiently testable.
Exist some such results for testing graph properties (e.g.
partition problems) and we have the regular languages
result.
 Extend scope of property testing.
Testing Properties of Collections of Points:
Testing of Clustering
Property Testing - Background
Properties of functions:
• Initially defined by Rubinfeld and Sudan in the context of
Program Testing. Tested algebraic properties of functions:
low-degree polynomials.
• Other work on testing algebraic properties: [BLR,R,EKKRV...].
• Non-algebraic properties: Monotonicity [GGLRS,DGLRSS,B,FN].
Properties of other objects:
• Main focus: Graph properties: [GGR,GR,AK,AFKS,BR,PR,CS...]
• Growing body of work deals with properties of
strings [AKNS,N,PRR], sets of points [PR], geometric objects
[CSZ], distributions [BFRW], and more.
All algorithms have complexity that is sub-linear in
(or even independent of) size of object.
Download