Big Hypothesis Testing with Kernels

advertisement
Big Hypothesis Testing with Kernels
Dino Sejdinovic, Heiko Strathmann, Aaditya Ramdas,
Soumyajit De, Wo jciech Zaremba, Matthew Blaschko,
Mladen Kolar and Arthur Gretton
September 26, 2014
Abstract
Embeddings of probability distributions into reproducing kernel
Hilbert spaces provide a exible framework to perform fully non-parametric
two-sample tests. However, this approach generally requires time (at least)
quadratic in the number of observations, which is often prohibitive. We give
a unied framework on how these tests can be scaled up to big datasets using
block-based procedures. It is demonstrated how to construct consistent tests
suited to data streams or to situations when the observations cannot be stored
in memory. In addition, we show how the kernel selection can also be performed on-the-y in order to maximize the asymptotic eciency of these tests.
A scalable implementation of the proposed testing framework is provided in the
open-source machine learning toolbox Shogun.
1
Download