Big Hypothesis Testing with Kernels Dino Sejdinovic, Heiko Strathmann, Aaditya Ramdas, Soumyajit De, Wo jciech Zaremba, Matthew Blaschko, Mladen Kolar and Arthur Gretton September 26, 2014 Abstract Embeddings of probability distributions into reproducing kernel Hilbert spaces provide a exible framework to perform fully non-parametric two-sample tests. However, this approach generally requires time (at least) quadratic in the number of observations, which is often prohibitive. We give a unied framework on how these tests can be scaled up to big datasets using block-based procedures. It is demonstrated how to construct consistent tests suited to data streams or to situations when the observations cannot be stored in memory. In addition, we show how the kernel selection can also be performed on-the-y in order to maximize the asymptotic eciency of these tests. A scalable implementation of the proposed testing framework is provided in the open-source machine learning toolbox Shogun. 1