High-dimensional and universally consistent k-sample tests

Panda, Sambit; Shen, Cencheng; Perry, Ronan; Zorn, Jelle; Lutz, Antoine; Priebe, Carey E.; Vogelstein, Joshua T.

Statistics > Machine Learning

arXiv:1910.08883v4 (stat)

[Submitted on 20 Oct 2019 (v1), revised 11 Oct 2023 (this version, v4), latest version 14 Sep 2024 (v5)]

Title:High-dimensional and universally consistent k-sample tests

Authors:Sambit Panda, Cencheng Shen, Ronan Perry, Jelle Zorn, Antoine Lutz, Carey E. Priebe, Joshua T. Vogelstein

View PDF

Abstract:The k-sample testing problem involves determining whether $k$ groups of data points are each drawn from the same distribution. The standard method for k-sample testing in biomedicine is Multivariate analysis of variance (MANOVA), despite that it depends on strong, and often unsuitable, parametric assumptions. Moreover, independence testing and k-sample testing are closely related, and several universally consistent high-dimensional independence tests such as distance correlation (Dcorr) and Hilbert-Schmidt-Independence-Criterion (Hsic) enjoy solid theoretical and empirical properties. In this paper, we prove that independence tests achieve universally consistent k-sample testing and that k-sample statistics such as Energy and Maximum Mean Discrepancy (MMD) are precisely equivalent to Dcorr. An empirical evaluation of nonparametric independence tests showed that they generally perform better than the popular MANOVA test, even in Gaussian distributed scenarios. The evaluation included several popular independence statistics and covered a comprehensive set of simulations. Additionally, the testing approach was extended to perform multiway and multilevel tests, which were demonstrated in a simulated study as well as a real-world fMRI brain scans with a set of attributes.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1910.08883 [stat.ML]
	(or arXiv:1910.08883v4 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1910.08883

Submission history

From: Sambit Panda [view email]
[v1] Sun, 20 Oct 2019 03:14:20 UTC (17 KB)
[v2] Fri, 18 Dec 2020 23:54:01 UTC (1,541 KB)
[v3] Fri, 2 Apr 2021 01:35:09 UTC (1,553 KB)
[v4] Wed, 11 Oct 2023 17:14:41 UTC (1,794 KB)
[v5] Sat, 14 Sep 2024 17:36:56 UTC (990 KB)

Statistics > Machine Learning

Title:High-dimensional and universally consistent k-sample tests

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:High-dimensional and universally consistent k-sample tests

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators