Search | arXiv e-print repository

Intrinsic Dimension for Large-Scale Geometric Learning

Authors: Maximilian Stubbemann, Tom Hanika, Friedrich Martin Schneider

Abstract: The concept of dimension is essential to grasp the complexity of data. A naive approach to determine the dimension of a dataset is based on the number of attributes. More sophisticated methods derive a notion of intrinsic dimension (ID) that employs more complex feature functions, e.g., distances between data points. Yet, many of these approaches are based on empirical observations, cannot cope wi… ▽ More The concept of dimension is essential to grasp the complexity of data. A naive approach to determine the dimension of a dataset is based on the number of attributes. More sophisticated methods derive a notion of intrinsic dimension (ID) that employs more complex feature functions, e.g., distances between data points. Yet, many of these approaches are based on empirical observations, cannot cope with the geometric character of contemporary datasets, and do lack an axiomatic foundation. A different approach was proposed by V. Pestov, who links the intrinsic dimension axiomatically to the mathematical concentration of measure phenomenon. First methods to compute this and related notions for ID were computationally intractable for large-scale real-world datasets. In the present work, we derive a computationally feasible method for determining said axiomatic ID functions. Moreover, we demonstrate how the geometric properties of complex data are accounted for in our modeling. In particular, we propose a principle way to incorporate neighborhood information, as in graph data, into the ID. This allows for new insights into common graph learning procedures, which we illustrate by experiments on the Open Graph Benchmark. △ Less

Submitted 17 April, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: 18 pages, 4 tables, 3 figures. This is the version accepted to TMLR, see: https://openreview.net/forum?id=85BfDdYMBY

Journal ref: Transactions on Machine Learning Research, 2023

arXiv:2208.04912 [pdf, other]

doi 10.1016/j.jmaa.2022.126591

An Application of Farkas' Lemma to Finite-Valued Constraint Satisfaction Problems over Infinite Domains

Authors: Friedrich Martin Schneider, Caterina Viola

Abstract: We show a universal algebraic local characterisation of the expressive power of finite-valued languages with domains of arbitrary cardinality and containing arbitrary many cost functions. We show a universal algebraic local characterisation of the expressive power of finite-valued languages with domains of arbitrary cardinality and containing arbitrary many cost functions. △ Less

Submitted 10 August, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

Comments: 18 pages. The paper is based on a chapter from Caterina Viola's doctoral dissertation. This is a preprint of a manuscript accepted for publication in Journal of Mathematical Analysis and Applications (JMAA)

MSC Class: 46Axx ACM Class: G.0; F.2.0

Journal ref: J. Math. Anal. Appl. 517 (2023) 126591

arXiv:1805.05714 [pdf, other]

Intrinsic dimension and its application to association rules

Authors: Tom Hanika, Friedrich Martin Schneider, Gerd Stumme

Abstract: The curse of dimensionality in the realm of association rules is twofold. Firstly, we have the well known exponential increase in computational complexity with increasing item set size. Secondly, there is a \emph{related curse} concerned with the distribution of (spare) data itself in high dimension. The former problem is often coped with by projection, i.e., feature selection, whereas the best kn… ▽ More The curse of dimensionality in the realm of association rules is twofold. Firstly, we have the well known exponential increase in computational complexity with increasing item set size. Secondly, there is a \emph{related curse} concerned with the distribution of (spare) data itself in high dimension. The former problem is often coped with by projection, i.e., feature selection, whereas the best known strategy for the latter is avoidance. This work summarizes the first attempt to provide a computationally feasible method for measuring the extent of dimension curse present in a data set with respect to a particular class machine of learning procedures. This recent development enables the application of various other methods from geometric analysis to be investigated and applied in machine learning procedures in the presence of high dimension. △ Less

Submitted 15 May, 2018; originally announced May 2018.

Comments: 4 pages, 1 figure

MSC Class: 68T01 68T05 ACM Class: I.2.6

arXiv:1801.07985 [pdf, other]

doi 10.2748/tmj.20201015a

Intrinsic Dimension of Geometric Data Sets

Authors: Tom Hanika, Friedrich Martin Schneider, Gerd Stumme

Abstract: The curse of dimensionality is a phenomenon frequently observed in machine learning (ML) and knowledge discovery (KD). There is a large body of literature investigating its origin and impact, using methods from mathematics as well as from computer science. Among the mathematical insights into data dimensionality, there is an intimate link between the dimension curse and the phenomenon of measure c… ▽ More The curse of dimensionality is a phenomenon frequently observed in machine learning (ML) and knowledge discovery (KD). There is a large body of literature investigating its origin and impact, using methods from mathematics as well as from computer science. Among the mathematical insights into data dimensionality, there is an intimate link between the dimension curse and the phenomenon of measure concentration, which makes the former accessible to methods of geometric analysis. The present work provides a comprehensive study of the intrinsic geometry of a data set, based on Gromov's metric measure geometry and Pestov's axiomatic approach to intrinsic dimension. In detail, we define a concept of geometric data set and introduce a metric as well as a partial order on the set of isomorphism classes of such data sets. Based on these objects, we propose and investigate an axiomatic approach to the intrinsic dimension of geometric data sets and establish a concrete dimension function with the desired properties. Our model for data sets and their intrinsic dimension is computationally feasible and, moreover, adaptable to specific ML/KD-algorithms, as illustrated by various experiments. △ Less

Submitted 26 October, 2020; v1 submitted 24 January, 2018; originally announced January 2018.

Comments: v3: 33 pages, 3 figures, 2 tables

MSC Class: 03G10 51F99 68P05 68T01 ACM Class: I.2.6

Journal ref: Tohoku Math. J. (2) 74 (2022) 23-52

arXiv:1709.06070 [pdf, ps, other]

doi 10.1090/proc/14343

MacWilliams' extension theorem for infinite rings

Authors: Friedrich Martin Schneider, Jens Zumbrägel

Abstract: Finite Frobenius rings have been characterized as precisely those finite rings satisfying the MacWilliams extension property, by work of Wood. In the present note we offer a generalization of this remarkable result to the realm of Artinian rings. Namely, we prove that a left Artinian ring has the left MacWilliams property if and only if it is left pseudo-injective and its finitary left socle embed… ▽ More Finite Frobenius rings have been characterized as precisely those finite rings satisfying the MacWilliams extension property, by work of Wood. In the present note we offer a generalization of this remarkable result to the realm of Artinian rings. Namely, we prove that a left Artinian ring has the left MacWilliams property if and only if it is left pseudo-injective and its finitary left socle embeds into the semisimple quotient. Providing a topological perspective on the MacWilliams property, we also show that the finitary left socle of a left Artinian ring embeds into the semisimple quotient if and only if it admits a finitarily left torsion-free character, if and only if the Pontryagin dual of the regular left module is almost monothetic. In conclusion, an Artinian ring has the MacWilliams property if and only if it is finitarily Frobenius, i.e., it is quasi-Frobenius and its finitary socle embeds into the semisimple quotient. △ Less

Submitted 8 August, 2018; v1 submitted 18 September, 2017; originally announced September 2017.

Comments: 14 pages. To appear in Proceedings of the AMS

MSC Class: 16L60; 16P20; 94B05

Journal ref: Proc. Amer. Math. Soc. 147 (2019), 947-961

Showing 1–5 of 5 results for author: Schneider, F M