Fast and Multiphase Rates for Nearest Neighbor Classifiers

Yang, Pengkun; Zhang, Jingzhao

Statistics > Machine Learning

arXiv:2308.08247 (stat)

[Submitted on 16 Aug 2023 (v1), last revised 3 Jun 2025 (this version, v2)]

Title:Fast and Multiphase Rates for Nearest Neighbor Classifiers

Authors:Pengkun Yang, Jingzhao Zhang

View PDF HTML (experimental)

Abstract:We study the scaling of classification error rates with respect to the size of the training dataset. In contrast to classical results where rates are minimax optimal for a problem class, this work starts with the empirical observation that, even for a fixed data distribution, the error scaling can have \emph{diverse} rates across different ranges of sample size. To understand when and why the error rate is non-uniform, we theoretically analyze nearest neighbor classifiers. We show that an error scaling law can have fine-grained rates: in the early phase, the test error depends polynomially on the data dimension and decreases fast; whereas in the later phase, the error depends exponentially on the data dimension and decreases slowly. Our analysis highlights the complexity of the data distribution in determining the test error. When the data are distributed benignly, we show that the generalization error of nearest neighbor classifier can depend polynomially, instead of exponentially, on the data dimension.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
Cite as:	arXiv:2308.08247 [stat.ML]
	(or arXiv:2308.08247v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2308.08247

Submission history

From: Jingzhao Zhang [view email]
[v1] Wed, 16 Aug 2023 09:28:55 UTC (9,679 KB)
[v2] Tue, 3 Jun 2025 07:05:41 UTC (1,172 KB)

Statistics > Machine Learning

Title:Fast and Multiphase Rates for Nearest Neighbor Classifiers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Fast and Multiphase Rates for Nearest Neighbor Classifiers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators