Testing Closeness of Distributions

Testing Collections of Properties Reut Levi Dana Ron Ronitt Rubinfeld ICS 2011 Shopping distribution What properties do your distributions have? Testing closeness of two distributions: Transactions in California Transactions in New York trend change? Testing Independence: Shopping patterns: Independent of zip code? This work: Many distributions One distribution:  D is arbitrary black-box distribution over [n], generates iid samples. D samples  Sample complexity in terms Test Pass/Fail? of n? (can it be sublinear?) Some answers…  Uniformity (n1/2) [Goldreich, Ron 00] [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] [Paninski 08]  Identity (n1/2) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01]  Closeness (n2/3) [Batu, Fortnow, Rubinfeld, Smith, White], [Valiant 08]  Independence O(n12/3 n21/3), (n12/3 n21/3) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] , this work  Entropy n1/β^2+o(1) [Batu, Dasgupta, Kumar, Rubinfeld 05], [Valiant 08]  Support Size (n/logn) [Raskhodnikova, Ron, Shpilka, Smith 09], [Valiant, Valiant 10]  Monotonicity on total order (n1/2) [Batu, Kumar, Rubinfeld 04]  Monotonicity on poset n1-o(1) [Bhattacharyya, Fischer, Rubinfeld, Valiant 10] Collection of distributions: Further refinement: Known or unknown distribution on i’s? D1 D2 … Dm samples  Two models:  Sampling model:  Get (i,x) for random i, xDi  Query model:  Get (i,x) for query i and xDi Test Pass/Fail?  Sample complexity in terms of n,m? Properties considered:  Equivalence  All distributions are equal  ``Clusterability’’  Distributions can be clustered into k clusters such that within a cluster, all distributions are close Equivalence vs. independence  Process of drawing pairs:  Draw i [m], x  Di output (i,x)  Easy fact: (i,x) independent iff Di‘s are equal Results Also yields “tight” lower bound for independence testing Def: (D1,…Dm) has the Equivalence property if Di = Di' for all 1 ≤ i, i’ ≤ m. Lower Bound Upper Bound n>m (n2/3m1/3) Õ(n2/3m1/3) Unknown Weights m>n (n1/2m1/2) Õ(n1/2m1/2) Known Weights Clusterability  Can we cluster distributions s.t. in each cluster, distributions (very) close?  Sample complexity of test is  O(kn2/3) for n = domain size, k = number of clusters  No dependence on number of distributions  Closeness requirement is very stringent Open Questions • Clusterability in the sampling model, less stringent notion of close • Other properties of collections? • E.g., all distributions are shifts of each other? Thank you

Testing Closeness of Distributions

Related documents

Products

Support

Testing Closeness of Distributions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib