Showing 1–2 of 2 results for author: Alvarez-Ayllon, A

Search v0.5.6 released 2020-02-24

arXiv:2212.08960 [pdf, ps, other]

cs.LG cs.NE

Two-sample test based on Self-Organizing Maps

Authors: Alejandro Álvarez-Ayllón, Manuel Palomo-Duarte, Juan-Manuel Dodero

Abstract: Machine-learning classifiers can be leveraged as a two-sample statistical test. Suppose each sample is assigned a different label and that a classifier can obtain a better-than-chance result discriminating them. In this case, we can infer that both samples originate from different populations. However, many types of models, such as neural networks, behave as a black-box for the user: they can reje… ▽ More Machine-learning classifiers can be leveraged as a two-sample statistical test. Suppose each sample is assigned a different label and that a classifier can obtain a better-than-chance result discriminating them. In this case, we can infer that both samples originate from different populations. However, many types of models, such as neural networks, behave as a black-box for the user: they can reject that both samples originate from the same population, but they do not offer insight into how both samples differ. Self-Organizing Maps are a dimensionality reduction initially devised as a data visualization tool that displays emergent properties, being also useful for classification tasks. Since they can be used as classifiers, they can be used also as a two-sample statistical test. But since their original purpose is visualization, they can also offer insights. △ Less

Submitted 17 December, 2022; originally announced December 2022.

Comments: 27 pages

MSC Class: 62-08 ACM Class: G.3
arXiv:2104.09809 [pdf, ps, other]

cs.DB

Inference of Common Multidimensional Equally-Distributed Attributes

Authors: Alejandro Alvarez-Ayllon, Manuel Palomo-Duarte, Juan-Manuel Dodero

Abstract: Given two relations containing multiple measurements - possibly with uncertainties - our objective is to find which sets of attributes from the first have a corresponding set on the second, using exclusively a sample of the data. This approach could be used even when the associated metadata is damaged, missing or incomplete, or when the volume is too big for exact methods. This problem is similar… ▽ More Given two relations containing multiple measurements - possibly with uncertainties - our objective is to find which sets of attributes from the first have a corresponding set on the second, using exclusively a sample of the data. This approach could be used even when the associated metadata is damaged, missing or incomplete, or when the volume is too big for exact methods. This problem is similar to the search of Inclusion Dependencies (IND), a type of rule over two relations asserting that for a set of attributes X from the first, every combination of values appears on a set Y from the second. Existing IND can be found exploiting the existence of a partial order relation called specialization. However, this relation is based on set theory, requiring the values to be directly comparable. Statistical tests are an intuitive possible replacement, but it has not been studied how would they affect the underlying assumptions. In this paper we formally review the effect that a statistical approach has over the inference rules applied to IND discovery. Our results confirm the intuitive thought that statistical tests can be used, but not in a directly equivalent manner. We provide a workable alternative based on a "hierarchy of null hypotheses", allowing for the automatic discovery of multi-dimensional equally distributed sets of attributes. △ Less

Submitted 19 July, 2022; v1 submitted 20 April, 2021; originally announced April 2021.

Comments: 11 pages, 2 figures

MSC Class: 68P20 (Primary); 62P99; 62B10 (Secondary) ACM Class: E.m; G.3; H.3.3

Search v0.5.6 released 2020-02-24