Interpreting Distortions in Dimensionality Reduction by Superimposing Neighbourhood Graphs
Authors:
BenoƮt Colange,
Laurent Vuillon,
Sylvain Lespinats,
Denys Dutykh
Abstract:
To perform visual data exploration, many dimensionality reduction methods have been developed. These tools allow data analysts to represent multidimensional data in a 2D or 3D space, while preserving as much relevant information as possible. Yet, they cannot preserve all structures simultaneously and they induce some unavoidable distortions. Hence, many criteria have been introduced to evaluate a…
▽ More
To perform visual data exploration, many dimensionality reduction methods have been developed. These tools allow data analysts to represent multidimensional data in a 2D or 3D space, while preserving as much relevant information as possible. Yet, they cannot preserve all structures simultaneously and they induce some unavoidable distortions. Hence, many criteria have been introduced to evaluate a map's overall quality, mostly based on the preservation of neighbourhoods. Such global indicators are currently used to compare several maps, which helps to choose the most appropriate mapping method and its hyperparameters. However, those aggregated indicators tend to hide the local repartition of distortions. Thereby, they need to be supplemented by local evaluation to ensure correct interpretation of maps. In this paper, we describe a new method, called MING, for `Map Interpretation using Neighbourhood Graphs'. It offers a graphical interpretation of pairs of map quality indicators, as well as local evaluation of the distortions. This is done by displaying on the map the nearest neighbours graphs computed in the data space and in the embedding. Shared and unshared edges exhibit reliable and unreliable neighbourhood information conveyed by the mapping. By this mean, analysts may determine whether proximity (or remoteness) of points on the map faithfully represents similarity (or dissimilarity) of original data, within the meaning of a chosen map quality criteria. We apply this approach to two pairs of widespread indicators: precision/recall and trustworthiness/continuity, chosen for their wide use in the community, which will allow an easy handling by users.
△ Less
Submitted 20 September, 2019;
originally announced September 2019.
A new supervised non-linear mapping
Authors:
Sylvain Lespinats,
Anke Meyer-Baese,
Michael Aupetit
Abstract:
Supervised mapping methods project multi-dimensional labeled data onto a 2-dimensional space attempting to preserve both data similarities and topology of classes. Supervised mappings are expected to help the user to understand the underlying original class structure and to classify new data visually. Several methods have been designed to achieve supervised mapping, but many of them modify origina…
▽ More
Supervised mapping methods project multi-dimensional labeled data onto a 2-dimensional space attempting to preserve both data similarities and topology of classes. Supervised mappings are expected to help the user to understand the underlying original class structure and to classify new data visually. Several methods have been designed to achieve supervised mapping, but many of them modify original distances prior to the mapping so that original data similarities are corrupted and even overlapping classes tend to be separated onto the map ignoring their original topology. We propose ClassiMap, an alternative method for supervised mapping. Mappings come with distortions which can be split between tears (close points mapped far apart) and false neighborhoods (points far apart mapped as neighbors). Some mapping methods favor the former while others favor the latter. ClassiMap switches between such mapping methods so that tears tend to appear between classes and false neighborhood within classes, better preserving classes' topology. We also propose two new objective criteria instead of the usual subjective visual inspection to perform fair comparisons of supervised mapping methods. ClassiMap appears to be the best supervised mapping method according to these criteria in our experiments on synthetic and real datasets.
△ Less
Submitted 9 March, 2012;
originally announced March 2012.