Property Testing in Sparse and General Graphs Michael Krivelevich Tel Aviv University 1 Graph Property Testing Very general setting: P = graph property to test (k-colorability, planarity, non-existence of a copy of H, etc.) Input: graph G on n vertices, n→∞ Should be specified! Promise: GP (positive) or: G is ε-far from P (negative) (ε-percentage of description of P should be changed to get HP) Algorithm A (typically randomized): Queries description of P GP Pr[ A accepts G] ≥ 2/3 G is ε-far from P Pr[ A rejects G] ≥ 2/3 GP, Pr[ A accepts G] =1 2 – one-sided error algorithm Property Testing in Dense Graphs - Formally defined in GGR’98 (appeared implicitly in combinatorial papers in 70’s, 80’s) Input graph description: adjacency matrix G=(V,E), V=[n] 1, (i, j ) E (G ) aij Ann 0, otherwise Algorithm: queries the adjacency matrix of G Query: whether (i,j) E(G)? (vertex pair query) Distance: G is is ε-far from P if ≥εn2 entries in A(G) need to be changed to get HP 3 Property Testing in Dense Graphs – Brief Summary “… It’s all about REGULARITY.” (AFNS’06) • Very strong (and fruitful) connection between property testing in dense graphs and the Szemerédi Regularity Lemma and its versions (started in AFKS’99 and culminated in AFNS’06) • Have reached very good understanding of this setting (though of course quite a few challenging problems remain) 4 Dense Graph Model - limitations • Suitable/tailored for dense graphs only • Degenerate for many graph properties Ex. : P = “ G is connected” - Always answer “YES” ( dist(G,P)≤ n-1 << εn2 ) • A typical algorithm: - sample S [n], |S|=O(1) - look inside to check whether G[S]P - returns a.s. empty set S for |E(G)|=o(n2) useless/irrelevant 5 Property Testing in Bounded Degree Graphs Introduced by GR’97 • Assumption: Δ(input graph G) ≤ d=const; ε<< 1/d • Graph representation: by incidence lists L(vi)=(vi,1,…,vi,d) – list of neighbors of vi • Query: who is the j-th neighbor of vi? (neighbor query) • Distance: G is ε-far from P if need ≥ εdn modifications in incidence lists to get HP 6 Bounded Degree Graphs – an Example Th. (GR’97): Connectivity in bounded degree model can be tested in O(1/ε2) queries Proof: Assume: G is ε-far from being connected G has ≥ εn connected components G has ≥ εn/2 con. components of size ≤ 2/ε (= small components) ≥ ε/2 percentage of all vertices in small components 7 Property Testing in Bounded Degree Graphs (cont.) Algorithm: Repeat O(1/ε) times: 1. Sample a random vertex vRV 2. Explore the connected component C(v) of v till accumulate 2/ε vertices 3. If |C(v)| ≤ 2/ε – reject If never reject – accept One-sided error algorithm with complexity O(1/ε2) More careful analysis 8 ~ (1/ε) queries O Testing bounded degree graphs – basic tools • Random sampling • Local search (exploring the neighborhood/ball of a vertex) • Random walks (a random neighbor of a random neighbor of a random neighbor…) 9 Bounded degree – first results Results from GR’97: Can test: -connectivity: ~ connectivity in O (1/ε) queries ~ 2-edge connectivity: O~ (1/ε2) 3-edge connectivity: O (1/ε3) ~ k-vertex connectivity, k=2,3: O (1/εk) - one-sided error algorithms Uses structural connectivity results (block, cactus, etc.) - cycle-freeness in O(1/ε3) queries - two-sided error algorithm Proof idea: G is ε-far from a forest many small components with a cycle, or large components Ci with large surplus e(Ci)-v(Ci) 10 Testing bipartiteness in bounded degree graphs P = “G is bipartite” Lower bound (GR’97): Ω(√n) queries - in very sharp contrast to the dense case Proof idea: Negative distribution DN= Hamilton cycle + random perfect matching (O(1)-far from being bipartite a.s.) Positive distribution DP=Hamilton cycle + random perfect matching between vertices of different parity = DN = DP Any tester: can’t distinguish between DP, DN before having seen a cycle Takes Ω(√n) queries by birthday paradox 11 Testing bipartiteness in bounded degree graphs (cont.) Th. (GR’99): There is a one-sided error algorithm~for testing bipartiteness in the bounded degree model in O (√n) queries. Algorithm: Repeat T= O(1/ε) times: 1. Choose a random vertex sRV ~ 2. Perform K:= O (√n) random walks of length L:=polylog(n) starting from s 3. If get to the same endvertex by an odd and an even path – reject If no rejection - accept 12 Testing bipartiteness in bounded degree graphs (cont.) Analysis: very elaborate - relatively easy for rapidly mixing case [s Pr[a random walk of length L starting from s] = Θ(1/n) )] - for general case: no rapid mixing small cut (M’89) use them to decompose the graph and the problem 13 Testing k-colorability P = “G is k-colorable”; k≥3 – fixed Obviously can be done in O(n) queries (just get all O(dn) edges of G) Th. (BOT’02): For every fixed k ≥3, testing k-colorability in the bounded degree model requires Ω(n) queries No room for sophisticated testing algorithms 14 Testing k-colorability (cont.) Proof Idea: For one-sided error: Can use classical result of Erdős’62: Th.: There exists G=(V,E), |V|=n, Δ(G)=O(1), G is ε-far from 3-colorable, but: every δn edges form a 3-colorable graph tester has to obtain ≥ δn edges to catch G0 G with χ(G0)>3 For two-sided error algorithm: - Two distributions (positive, negative) over instances of systems of linear equations; Any algorithm can’t distinguish between them in o(n) time - Then: gap preserving reductions from linear equations to 3colorability 15 Testing in non-expanding bounded degree graphs Czumaj, Shapira, Sohler’07 Notion of hereditary non-expanding graphs: Def: G is λ-expanding if for every V0 V(G), |V0| ≤n/2, |N(V0)|≥ λ |V_0| Def: Graph family F is non-expanding if there exists n0=n0(F) s.t. for all GF , |V(G)|≥ n0, G is not (1/log2n)-expanding Ex.: F =planar graphs – non-expanding (exists separator of size O(√|V(G)|) Use: G non-expanding family F , bounded degree can repeatedly cut G to decompose it into constant sized pieces H1,H2,…, number of edges between pieces ≤ ε n/2 16 Testing in non-expanding graphs (cont.) Th. (CSS): P= hereditary property (closed under taking induced subgraphs, say, 3-colorability) Assume: Input G non-expanding family F of bounded degree subgraphs P can be tested over F in constant time f(ε) Proof idea: Decompose G=(H1,H2,…) as above G=negative instance many of Hi’s are witnesses can be found by random sampling + local search 17 Testing planarity Th. (BSS’08) P = “G is planar” P can be tested in time Oε(1) in bounded degree graphs by a 2-sided error algorithm (proved more: every minor-closed property P is testable in constant time) Proof idea: Local statistics in planar graphs differ substantially from those in graphs ε-far from planar (related to hyper-finite graphs, converging sequences of sparse graphs, etc.) 18 Testing planarity (cont.) Remarks: 1. Get two-sided error algorithm, query complexity exp(exp(exp(1/ε))). Better query complexity? 2. Two-sided vs one-sided Ex: G= bounded degree expander of high girth (Θ(log n)) (say, LPS graph) - Θ(1)-far from planar - every c logn edges form a forest planar subgraph LB=Ω(log n) can strengthen to Ω(√ n) of GR’97 Conj: P= “G is H-minor free” P can be tested with a one-sided error algorithm in O(√n) queries 19 Bounded degree graphs –open questions • Characterization of testable properties? (testable := testable in Oε(1) queries) or at least: wide classes of testable properties • One-sided vs two-sided? Comparative study for various properties • Testing in restricted graph classes? (á la CSS) • Tolerant testing? Estimating distance to a given property? 20 Bounded degree model - limitations Opposite/similar to the dense model • Suitable/tailored only for bounded degree graphs • Distance notion is “hardwired” – measured always w.r.t. to dn • Degenerates for certain properties (e.g. √ n-colorability – always answer “YES”) 21 Testing in graphs of general density - Introduced in KKR’03 Main principles: 1. Distance in measured w.r.t. to the actual size of the input graph (latter can be approximated first if necessary) G=(V,E) is ε-far from P if ≥ ε|E| edges need to be changed to get HP (appeared already in PR’02) 2. Queries allowed: a) vertex pair queries: whether (i,j) E(G)? (like in the dense model) b) neighbor queries: j-th neighbor of i V(G)? (like in the sparse model) c) degree queries: what is dG(i)? No inherent limitation on input graph density 22 Testing bipartiteness in general graphs Th. (KKR’03): 1. 2. ~ Testing bipartiteness can be done in O(min(√n, n/d)) queries, where d=2|E|/|V| is the average degree of G; Lower bound of Ω(min(√n, n/d)) queries √n √n n d - continuous interpolation between the sparse and the dense cases 23 Testing bipartiteness for general graphs - proofs Upper bound: Case d≤√n – same as in the bounded degree model K:= Oε(√n), L:=polylogε(n) Repeat T= O(1/ε) times: 1. Choose a random vertex sRV 2. Perform K random walks of length L starting from s 3. A0 = endpoints of walks corresponding to paths of even length A1 = endpoints of walks corresponding to paths of odd length 4. If A0∩ A1 ≠Ø – reject, found an odd cycle Never rejected - accept 24 Testing bipartiteness for general graphs – proofs (cont.) Upper bound: Case d≥√n Now: K:= Oε(√(n/d)), L:=polylogε(n) A0 , A1 – as before Check whether A0 or A1 spans an edge (here use vertex pair queries) If happens – reject Never happens - accept 25 Testing bipartiteness for general graphs – proofs (cont.) Lower bound: Negative distribution DN= Gn,d – random d-regular graph Positive distribution DP=Gn/2,n/2,d – random bipartite d-regular graph - choose an equipartition V=(V1,V2) u.a.r. - construct a random d-regular bipartite graph between V1, V2 Proof idea: ALG = arbitrary algorithm • o(n/d) vertex pair queries a.s. do not produce an edge • have seen o(√n) vertices a.s. no neighbor query closes a cycle (birthday paradox) o(min(n/d, √n)) queries – both items apply, can’t distinguish between DP, DN 26 Testing triangle-freeness in general graphs Result of AKKR’06 Property P to test = “G is K3-free” Most interesting part – Lower Bound d:=average degree of the input graph • d≤ n1-δ(n), δ(n)→ 0 Ω(n1/3) queries are needed • d=Θ(n) Oε(1) queries are enough (AFKS’99) Threshold-like behavior for query complexity, abrupt change around d=Θ(n) Proof Idea: Cayley graphs, set of generators – random subset of a dense 3AP-free set (c.f. A’02 for the dense case) 27 Comparative study of strength of different query types - BKKR’08 Test case: k-colorability, k≥3 fixed Models to compare: vertex pair queries neighbor queries combined model (pair+neighbor queries) new query type – group query Group query: vV - vertex, S – vertex subset ? Whether there is an edge between v and S in G ? 28 YES/NO (and then can find a random edge between v and S in O(log n) queries if needed) motivated by Group Testing Comparative study of strength of different query types -results On the qualitative level: • vertex pair, neighbor < combined model < group query Say, in testing bipartiteness • vertex pair queries are better for dense graphs, neighbor queries are better for sparse graphs • for group queries: UB=O(n/d) LB= Ω(n/d) (d := average degree of the input graph) 29 Testing general graphs – open problems Results for (other) concrete problems? (testing H-freeness, k-colorability, etc.) Develop technology for proving lower bounds One-sided vs two-sided error algorithms? What if given ability to sample a random edge? (to eliminate hiding small dense hard instances) Further query types, their comparison? Query types driven by practical applications? 30