Search | arXiv e-print repository

MMD Aggregated Two-Sample Test

Authors: Antonin Schrab, Ilmun Kim, Mélisande Albert, Béatrice Laurent, Benjamin Guedj, Arthur Gretton

Abstract: We propose two novel nonparametric two-sample kernel tests based on the Maximum Mean Discrepancy (MMD). First, for a fixed kernel, we construct an MMD test using either permutations or a wild bootstrap, two popular numerical procedures to determine the test threshold. We prove that this test controls the probability of type I error non-asymptotically. Hence, it can be used reliably even in setting… ▽ More We propose two novel nonparametric two-sample kernel tests based on the Maximum Mean Discrepancy (MMD). First, for a fixed kernel, we construct an MMD test using either permutations or a wild bootstrap, two popular numerical procedures to determine the test threshold. We prove that this test controls the probability of type I error non-asymptotically. Hence, it can be used reliably even in settings with small sample sizes as it remains well-calibrated, which differs from previous MMD tests which only guarantee correct test level asymptotically. When the difference in densities lies in a Sobolev ball, we prove minimax optimality of our MMD test with a specific kernel depending on the smoothness parameter of the Sobolev ball. In practice, this parameter is unknown and, hence, the optimal MMD test with this particular kernel cannot be used. To overcome this issue, we construct an aggregated test, called MMDAgg, which is adaptive to the smoothness parameter. The test power is maximised over the collection of kernels used, without requiring held-out data for kernel selection (which results in a loss of test power), or arbitrary kernel choices such as the median heuristic. We prove that MMDAgg still controls the level non-asymptotically, and achieves the minimax rate over Sobolev balls, up to an iterated logarithmic term. Our guarantees are not restricted to a specific type of kernel, but hold for any product of one-dimensional translation invariant characteristic kernels. We provide a user-friendly parameter-free implementation of MMDAgg using an adaptive collection of bandwidths. We demonstrate that MMDAgg significantly outperforms alternative state-of-the-art MMD-based two-sample tests on synthetic data satisfying the Sobolev smoothness assumption, and that, on real-world image data, MMDAgg closely matches the power of tests leveraging the use of models such as neural networks. △ Less

Submitted 21 August, 2023; v1 submitted 28 October, 2021; originally announced October 2021.

Comments: 81 pages

Journal ref: Journal of Machine Learning Research 24(194), 1-81, 2023

arXiv:1806.02460 [pdf, other]

The effect of the choice of neural network depth and breadth on the size of its hypothesis space

Authors: Lech Szymanski, Brendan McCane, Michael Albert

Abstract: We show that the number of unique function mappings in a neural network hypothesis space is inversely proportional to $\prod_lU_l!$, where $U_{l}$ is the number of neurons in the hidden layer $l$. We show that the number of unique function mappings in a neural network hypothesis space is inversely proportional to $\prod_lU_l!$, where $U_{l}$ is the number of neurons in the hidden layer $l$. △ Less

Submitted 6 June, 2018; originally announced June 2018.

arXiv:1505.06129 [pdf, other]

A Distribution Free Unitary Events Method based on Delayed Coincidence Count

Authors: Mélisande Albert, Yann Bouret, Magalie Fromont, Patricia Reynaud-Bouret

Abstract: We investigate several distribution free dependence detection procedures, mainly based on bootstrap principles and their approximation properties. Thanks to this study, we introduce a new distribution free Unitary Events (UE) method, named Permutation UE, which consists in a multiple testing procedure based on permutation and delayed coincidence count. Each involved single test of this procedure a… ▽ More We investigate several distribution free dependence detection procedures, mainly based on bootstrap principles and their approximation properties. Thanks to this study, we introduce a new distribution free Unitary Events (UE) method, named Permutation UE, which consists in a multiple testing procedure based on permutation and delayed coincidence count. Each involved single test of this procedure achieves the prescribed level, so that the corresponding multiple testing procedure controls the False Discovery Rate (FDR), and this with as few assumptions as possible on the underneath distribution. Some simulations show that this method outperforms the trial-shuffling and the MTGAUE method in terms of single levels and FDR, for a comparable amount of false negatives. Application on real data is also provided. △ Less

Submitted 22 May, 2015; originally announced May 2015.

Comments: 45 pages, 8 figures

MSC Class: 62P10; 62G10 ACM Class: G.3; J.3

Showing 1–3 of 3 results for author: Albert, M