-
A Survey on Archetypal Analysis
Authors:
Aleix Alcacer,
Irene Epifanio,
Sebastian Mair,
Morten Mørup
Abstract:
Archetypal analysis (AA) was originally proposed in 1994 by Adele Cutler and Leo Breiman as a computational procedure to extract the distinct aspects called archetypes in observations with each observational record approximated as a mixture (i.e., convex combination) of these archetypes. AA thereby provides straightforward, interpretable, and explainable representations for feature extraction and…
▽ More
Archetypal analysis (AA) was originally proposed in 1994 by Adele Cutler and Leo Breiman as a computational procedure to extract the distinct aspects called archetypes in observations with each observational record approximated as a mixture (i.e., convex combination) of these archetypes. AA thereby provides straightforward, interpretable, and explainable representations for feature extraction and dimensionality reduction, facilitating the understanding of the structure of high-dimensional data with wide applications throughout the sciences. However, AA also faces challenges, particularly as the associated optimization problem is non-convex. This survey provides researchers and data mining practitioners an overview of methodologies and opportunities that AA has to offer surveying the many applications of AA across disparate fields of science, as well as best practices for modeling data using AA and limitations. The survey concludes by explaining important future research directions concerning AA.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Biarchetype analysis: simultaneous learning of observations and features based on extremes
Authors:
Aleix Alcacer,
Irene Epifanio,
Ximo Gual-Arnau
Abstract:
We introduce a novel exploratory technique, termed biarchetype analysis, which extends archetype analysis to simultaneously identify archetypes of both observations and features. This innovative unsupervised machine learning tool aims to represent observations and features through instances of pure types, or biarchetypes, which are easily interpretable as they embody mixtures of observations and f…
▽ More
We introduce a novel exploratory technique, termed biarchetype analysis, which extends archetype analysis to simultaneously identify archetypes of both observations and features. This innovative unsupervised machine learning tool aims to represent observations and features through instances of pure types, or biarchetypes, which are easily interpretable as they embody mixtures of observations and features. Furthermore, the observations and features are expressed as mixtures of the biarchetypes, which makes the structure of the data easier to understand. We propose an algorithm to solve biarchetype analysis. Although clustering is not the primary aim of this technique, biarchetype analysis is demonstrated to offer significant advantages over biclustering methods, particularly in terms of interpretability. This is attributed to biarchetypes being extreme instances, in contrast to the centroids produced by biclustering, which inherently enhances human comprehension. The application of biarchetype analysis across various machine learning challenges underscores its value, and both the source code and examples are readily accessible in R and Python at https://github.com/aleixalcacer/JA-BIAA.
△ Less
Submitted 22 May, 2024; v1 submitted 18 November, 2023;
originally announced November 2023.
-
Ordinal classification for interval-valued data and interval-valued functional data
Authors:
Aleix Alcacer,
Marina Martínez-Garcia,
Irene Epifanio
Abstract:
The aim of ordinal classification is to predict the ordered labels of the output from a set of observed inputs. Interval-valued data refers to data in the form of intervals. For the first time, interval-valued data and interval-valued functional data are considered as inputs in an ordinal classification problem. Six ordinal classifiers for interval data and interval-valued functional data are prop…
▽ More
The aim of ordinal classification is to predict the ordered labels of the output from a set of observed inputs. Interval-valued data refers to data in the form of intervals. For the first time, interval-valued data and interval-valued functional data are considered as inputs in an ordinal classification problem. Six ordinal classifiers for interval data and interval-valued functional data are proposed. Three of them are parametric, one of them is based on ordinal binary decompositions and the other two are based on ordered logistic regression. The other three methods are based on the use of distances between interval data and kernels on interval data. One of the methods uses the weighted $k$-nearest-neighbor technique for ordinal classification. Another method considers kernel principal component analysis plus an ordinal classifier. And the sixth method, which is the method that performs best, uses a kernel-induced ordinal random forest. They are compared with naïve approaches in an extensive experimental study with synthetic and original real data sets, about human global development, and weather data. The results show that considering ordering and interval-valued information improves the accuracy. The source code and data sets are available at https://github.com/aleixalcacer/OCFIVD.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Finding archetypal patterns for binary questionnaires
Authors:
Ismael Cabero,
Irene Epifanio
Abstract:
Archetypal analysis is an exploratory tool that explains a set of observations as mixtures of pure (extreme) patterns. If the patterns are actual observations of the sample, we refer to them as archetypoids. For the first time, we propose to use archetypoid analysis for binary observations. This tool can contribute to the understanding of a binary data set, as in the multivariate case. We illustra…
▽ More
Archetypal analysis is an exploratory tool that explains a set of observations as mixtures of pure (extreme) patterns. If the patterns are actual observations of the sample, we refer to them as archetypoids. For the first time, we propose to use archetypoid analysis for binary observations. This tool can contribute to the understanding of a binary data set, as in the multivariate case. We illustrate the advantages of the proposed methodology in a simulation study and two applications, one exploring objects (rows) and the other exploring items (columns). One is related to determining student skill set profiles and the other to describing item response functions.
△ Less
Submitted 28 February, 2020;
originally announced March 2020.
-
Robust multivariate and functional archetypal analysis with application to financial time series analysis
Authors:
Jesús Moliner,
Irene Epifanio
Abstract:
Archetypal analysis approximates data by means of mixtures of actual extreme cases (archetypoids) or archetypes, which are a convex combination of cases in the data set. Archetypes lie on the boundary of the convex hull. This makes the analysis very sensitive to outliers. A robust methodology by means of M-estimators for classical multivariate and functional data is proposed. This unsupervised met…
▽ More
Archetypal analysis approximates data by means of mixtures of actual extreme cases (archetypoids) or archetypes, which are a convex combination of cases in the data set. Archetypes lie on the boundary of the convex hull. This makes the analysis very sensitive to outliers. A robust methodology by means of M-estimators for classical multivariate and functional data is proposed. This unsupervised methodology allows complex data to be understood even by non-experts. The performance of the new procedure is assessed in a simulation study, where a comparison with a previous methodology for the multivariate case is also carried out, and our proposal obtains favorable results. Finally, robust bivariate functional archetypoid analysis is applied to a set of companies in the S\&P 500 described by two time series of stock quotes. A new graphic representation is also proposed to visualize the results. The analysis shows how the information can be easily interpreted and how even non-experts can gain a qualitative understanding of the data.
△ Less
Submitted 22 December, 2018; v1 submitted 1 October, 2018;
originally announced October 2018.
-
Generalized partially linear models on Riemannian manifolds
Authors:
Amelia Simó,
M. Victoria Ibáñez,
Irene Epifanio,
Vicent Gimeno
Abstract:
The generalized partially linear models on Riemannian manifolds are introduced. These models, like ordinary generalized linear models, are a generalization of partially linear models on Riemannian manifolds that allow for response variables with error distribution models other than a normal distribution. Partially linear models are particularly useful when some of the covariates of the model are e…
▽ More
The generalized partially linear models on Riemannian manifolds are introduced. These models, like ordinary generalized linear models, are a generalization of partially linear models on Riemannian manifolds that allow for response variables with error distribution models other than a normal distribution. Partially linear models are particularly useful when some of the covariates of the model are elements of a Riemannian manifold, because the curvature of these spaces makes it difficult to define parametric models. The model was developed to address an interesting application, the prediction of children's garment fit based on 3D scanning of their body. For this reason, we focus on logistic and ordinal models and on the important and difficult case where the Riemannian manifold is the three-dimensional case of Kendall's shape space. An experimental study with a well-known 3D database is carried out to check the goodness of the procedure. Finally it is applied to a 3D database obtained from an anthropometric survey of the Spanish child population. A comparative study with related techniques is carried out.
△ Less
Submitted 8 March, 2018;
originally announced March 2018.
-
Functional archetype and archetypoid analysis
Authors:
Irene Epifanio
Abstract:
Archetype and archetypoid analysis can be extended to functional data. Each function is represented as a mixture of actual observations (functional archetypoids) or functional archetypes, which are a mixture of observations in the data set. Well-known Canadian temperature data are used to illustrate the analysis developed. Computational methods are proposed for performing these analyses, based on…
▽ More
Archetype and archetypoid analysis can be extended to functional data. Each function is represented as a mixture of actual observations (functional archetypoids) or functional archetypes, which are a mixture of observations in the data set. Well-known Canadian temperature data are used to illustrate the analysis developed. Computational methods are proposed for performing these analyses, based on the coefficients of a basis. Unlike a previous attempt to compute functional archetypes, which was only valid for an orthogonal basis, the proposed methodology can be used for any basis. It is computationally less demanding than the simple approach of discretizing the functions. Multivariate functional archetype and archetypoid analysis are also introduced and applied in an interesting problem about the study of human development around the world over the last 50 years. These tools can contribute to the understanding of a functional data set, as in the multivariate case.
△ Less
Submitted 25 June, 2016; v1 submitted 26 January, 2016;
originally announced January 2016.