Search | arXiv e-print repository

Universal consistency of the $k$-NN rule in metric spaces and Nagata dimension. II

Authors: Sushma Kumari, Vladimir G. Pestov

Abstract: We continue to investigate the $k$ nearest neighbour ($k$-NN) learning rule in complete separable metric spaces. Thanks to the results of Cérou and Guyader (2006) and Preiss (1983), this rule is known to be universally consistent in every such metric space that is sigma-finite dimensional in the sense of Nagata. Here we show that the rule is strongly universally consistent in such spaces in the ab… ▽ More We continue to investigate the $k$ nearest neighbour ($k$-NN) learning rule in complete separable metric spaces. Thanks to the results of Cérou and Guyader (2006) and Preiss (1983), this rule is known to be universally consistent in every such metric space that is sigma-finite dimensional in the sense of Nagata. Here we show that the rule is strongly universally consistent in such spaces in the absence of ties. Under the tie-breaking strategy applied by Devroye, Györfi, Krzyżak, and Lugosi (1994) in the Euclidean setting, we manage to show the strong universal consistency in non-Archimedian metric spaces (that is, those of Nagata dimension zero). Combining the theorem of Cérou and Guyader with results of Assouad and Quentin de Gromard (2006), one deduces that the $k$-NN rule is universally consistent in metric spaces having finite dimension in the sense of de Groot. In particular, the $k$-NN rule is universally consistent in the Heisenberg group which is not sigma-finite dimensional in the sense of Nagata as follows from an example independently constructed by Korányi and Reimann (1995) and Sawyer and Wheeden (1992). △ Less

Submitted 20 March, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: Latex 2e, 27 pages, 1 figure. Minor revisions to conform with the last set of journal page proofs: two typos corrected, the bibliography rearranged in the order of citations (the ESAIM:PS home style), and two articles that were no longer cited removed

MSC Class: 62H30; 54F45

Journal ref: ESAIM Probability & Statistics 28(2024), 132-160

arXiv:2005.01886 [pdf, ps, other]

A learning problem whose consistency is equivalent to the non-existence of real-valued measurable cardinals

Authors: Vladimir G. Pestov

Abstract: We show that the $k$-nearest neighbour learning rule is universally consistent in a metric space $X$ if and only if it is universally consistent in every separable subspace of $X$ and the density of $X$ is less than every real-measurable cardinal. In particular, the $k$-NN classifier is universally consistent in every metric space whose separable subspaces are sigma-finite dimensional in the sense… ▽ More We show that the $k$-nearest neighbour learning rule is universally consistent in a metric space $X$ if and only if it is universally consistent in every separable subspace of $X$ and the density of $X$ is less than every real-measurable cardinal. In particular, the $k$-NN classifier is universally consistent in every metric space whose separable subspaces are sigma-finite dimensional in the sense of Nagata and Preiss if and only if there are no real-valued measurable cardinals. The latter assumption is relatively consistent with ZFC, however the consistency of the existence of such cardinals cannot be proved within ZFC. Our results were inspired by an example sketched by Cérou and Guyader in 2006 at an intuitive level of rigour. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: 16 pp., journal macros

MSC Class: 62H30; 54F45; 03E55 ACM Class: I.2.6

Journal ref: Addendum was revised and published as a separate paper: On a result of K P. Hart about non-existence of measurable solutions to the discrete expectation maximization problem, Comment. Math. Univ. Carolin. 64 (2023), 353--358

arXiv:2003.00894 [pdf, other]

doi 10.1051/ps/2020018

Universal consistency of the $k$-NN rule in metric spaces and Nagata dimension

Authors: Benoît Collins, Sushma Kumari, Vladimir G. Pestov

Abstract: The $k$ nearest neighbour learning rule (under the uniform distance tie breaking) is universally consistent in every metric space $X$ that is sigma-finite dimensional in the sense of Nagata. This was pointed out by Cérou and Guyader (2006) as a consequence of the main result by those authors, combined with a theorem in real analysis sketched by D. Preiss (1971) (and elaborated in detail by Assouad… ▽ More The $k$ nearest neighbour learning rule (under the uniform distance tie breaking) is universally consistent in every metric space $X$ that is sigma-finite dimensional in the sense of Nagata. This was pointed out by Cérou and Guyader (2006) as a consequence of the main result by those authors, combined with a theorem in real analysis sketched by D. Preiss (1971) (and elaborated in detail by Assouad and Quentin de Gromard (2006)). We show that it is possible to give a direct proof along the same lines as the original theorem of Charles J. Stone (1977) about the universal consistency of the $k$-NN classifier in the finite dimensional Euclidean space. The generalization is non-trivial because of the distance ties being more prevalent in the non-euclidean setting, and on the way we investigate the relevant geometric properties of the metrics and the limitations of the Stone argument, by constructing various examples. △ Less

Submitted 14 June, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

Comments: 21 pp., 2 figures, latex with ESAIM: Probability and Statistics macros, a version with the two anonymous referees comments taken into account

MSC Class: 62H30; 54F45

Journal ref: ESAIM: Probability and Statistics 24 (2020), 914--934

arXiv:1910.06820 [pdf, other]

Elementos da teoria de aprendizagem de máquina supervisionada

Authors: Vladimir G. Pestov

Abstract: This is a set of lecture notes for an introductory course (advanced undergaduates or the 1st graduate course) on foundations of supervised machine learning (in Portuguese). The topics include: the geometry of the Hamming cube, concentration of measure, shattering and VC dimension, Glivenko-Cantelli classes, PAC learnability, universal consistency and the k-NN classifier in metric spaces, dimension… ▽ More This is a set of lecture notes for an introductory course (advanced undergaduates or the 1st graduate course) on foundations of supervised machine learning (in Portuguese). The topics include: the geometry of the Hamming cube, concentration of measure, shattering and VC dimension, Glivenko-Cantelli classes, PAC learnability, universal consistency and the k-NN classifier in metric spaces, dimensionality reduction, universal approximation, sample compression. There are appendices on metric and normed spaces, measure theory, etc., making the notes self-contained. Este é um conjunto de notas de aula para um curso introdutório (curso de graduação avançado ou o 1o curso de pós) sobre fundamentos da aprendizagem de máquina supervisionada (em Português). Os tópicos incluem: a geometria do cubo de Hamming, concentração de medida, fragmentação e dimensão de Vapnik-Chervonenkis, classes de Glivenko-Cantelli, aprendizabilidade PAC, consistência universal e o classificador k-NN em espaços métricos, redução de dimensionalidade, aproximação universal, compressão amostral. Há apêndices sobre espaços métricos e normados, teoria de medida, etc., tornando as notas autosuficientes. △ Less

Submitted 5 October, 2019; originally announced October 2019.

Comments: 390 pp. + vii, in Portuguese, a preliminary version, to be published by IMPA as a book of lectures of the 23nd Brazilian Math Colloquium (July 28 - Aug 2, 2019), submitted to arXiv upon IMPA permission

MSC Class: 68Q32; 62H30; 68T05; 68T10

Showing 1–4 of 4 results for author: Pestov, V G