-
TopoMap++: A faster and more space efficient technique to compute projections with topological guarantees
Authors:
Vitoria Guardieiro,
Felipe Inagaki de Oliveira,
Harish Doraiswamy,
Luis Gustavo Nonato,
Claudio Silva
Abstract:
High-dimensional data, characterized by many features, can be difficult to visualize effectively. Dimensionality reduction techniques, such as PCA, UMAP, and t-SNE, address this challenge by projecting the data into a lower-dimensional space while preserving important relationships. TopoMap is another technique that excels at preserving the underlying structure of the data, leading to interpretabl…
▽ More
High-dimensional data, characterized by many features, can be difficult to visualize effectively. Dimensionality reduction techniques, such as PCA, UMAP, and t-SNE, address this challenge by projecting the data into a lower-dimensional space while preserving important relationships. TopoMap is another technique that excels at preserving the underlying structure of the data, leading to interpretable visualizations. In particular, TopoMap maps the high-dimensional data into a visual space, guaranteeing that the 0-dimensional persistence diagram of the Rips filtration of the visual space matches the one from the high-dimensional data. However, the original TopoMap algorithm can be slow and its layout can be too sparse for large and complex datasets. In this paper, we propose three improvements to TopoMap: 1) a more space-efficient layout, 2) a significantly faster implementation, and 3) a novel TreeMap-based representation that makes use of the topological hierarchy to aid the exploration of the projections. These advancements make TopoMap, now referred to as TopoMap++, a more powerful tool for visualizing high-dimensional data which we demonstrate through different use case scenarios.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
MOUNTAINEER: Topology-Driven Visual Analytics for Comparing Local Explanations
Authors:
Parikshit Solunke,
Vitoria Guardieiro,
Joao Rulff,
Peter Xenopoulos,
Gromit Yeuk-Yin Chan,
Brian Barr,
Luis Gustavo Nonato,
Claudio Silva
Abstract:
With the increasing use of black-box Machine Learning (ML) techniques in critical applications, there is a growing demand for methods that can provide transparency and accountability for model predictions. As a result, a large number of local explainability methods for black-box models have been developed and popularized. However, machine learning explanations are still hard to evaluate and compar…
▽ More
With the increasing use of black-box Machine Learning (ML) techniques in critical applications, there is a growing demand for methods that can provide transparency and accountability for model predictions. As a result, a large number of local explainability methods for black-box models have been developed and popularized. However, machine learning explanations are still hard to evaluate and compare due to the high dimensionality, heterogeneous representations, varying scales, and stochastic nature of some of these methods. Topological Data Analysis (TDA) can be an effective method in this domain since it can be used to transform attributions into uniform graph representations, providing a common ground for comparison across different explanation methods.
We present a novel topology-driven visual analytics tool, Mountaineer, that allows ML practitioners to interactively analyze and compare these representations by linking the topological graphs back to the original data distribution, model predictions, and feature attributions. Mountaineer facilitates rapid and iterative exploration of ML explanations, enabling experts to gain deeper insights into the explanation techniques, understand the underlying data distributions, and thus reach well-founded conclusions about model behavior. Furthermore, we demonstrate the utility of Mountaineer through two case studies using real-world data. In the first, we show how Mountaineer enabled us to compare black-box ML explanations and discern regions of and causes of disagreements between different explanations. In the second, we demonstrate how the tool can be used to compare and understand ML models themselves. Finally, we conducted interviews with three industry experts to help us evaluate our work.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Exploring the Relationship Between Feature Attribution Methods and Model Performance
Authors:
Priscylla Silva,
Claudio T. Silva,
Luis Gustavo Nonato
Abstract:
Machine learning and deep learning models are pivotal in educational contexts, particularly in predicting student success. Despite their widespread application, a significant gap persists in comprehending the factors influencing these models' predictions, especially in explainability within education. This work addresses this gap by employing nine distinct explanation methods and conducting a comp…
▽ More
Machine learning and deep learning models are pivotal in educational contexts, particularly in predicting student success. Despite their widespread application, a significant gap persists in comprehending the factors influencing these models' predictions, especially in explainability within education. This work addresses this gap by employing nine distinct explanation methods and conducting a comprehensive analysis to explore the correlation between the agreement among these methods in generating explanations and the predictive model's performance. Applying Spearman's correlation, our findings reveal a very strong correlation between the model's performance and the agreement level observed among the explanation methods.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
T-Explainer: A Model-Agnostic Explainability Framework Based on Gradients
Authors:
Evandro S. Ortigossa,
Fábio F. Dias,
Brian Barr,
Claudio T. Silva,
Luis Gustavo Nonato
Abstract:
The development of machine learning applications has increased significantly in recent years, motivated by the remarkable ability of learning-powered systems to discover and generalize intricate patterns hidden in massive datasets. Modern learning models, while powerful, often exhibit a complexity level that renders them opaque black boxes, lacking transparency and hindering our understanding of t…
▽ More
The development of machine learning applications has increased significantly in recent years, motivated by the remarkable ability of learning-powered systems to discover and generalize intricate patterns hidden in massive datasets. Modern learning models, while powerful, often exhibit a complexity level that renders them opaque black boxes, lacking transparency and hindering our understanding of their decision-making processes. Opacity challenges the practical application of machine learning, especially in critical domains requiring informed decisions. Explainable Artificial Intelligence (XAI) addresses that challenge, unraveling the complexity of black boxes by providing explanations. Feature attribution/importance XAI stands out for its ability to delineate the significance of input features in predictions. However, most attribution methods have limitations, such as instability, when divergent explanations result from similar or the same instance. This work introduces T-Explainer, a novel additive attribution explainer based on the Taylor expansion that offers desirable properties such as local accuracy and consistency. We demonstrate T-Explainer's effectiveness and stability over multiple runs in quantitative benchmark experiments against well-known attribution methods. Additionally, we provide several tools to evaluate and visualize explanations, turning T-Explainer into a comprehensive XAI framework.
△ Less
Submitted 24 April, 2025; v1 submitted 25 April, 2024;
originally announced April 2024.
-
TensorAnalyzer: Identification of Urban Patterns in Big Cities using Non-Negative Tensor Factorization
Authors:
Jaqueline Silveira,
Germain García,
Afonso Paiva,
Marcelo Nery,
Sergio Adorno,
Luis Gustavo Nonato
Abstract:
Extracting relevant urban patterns from multiple data sources can be difficult using classical clustering algorithms since we have to make a suitable setup of the hyperparameters of the algorithms and deal with outliers. It should be addressed correctly to help urban planners in the decision-making process for the further development of a big city. For instance, experts' main interest in criminolo…
▽ More
Extracting relevant urban patterns from multiple data sources can be difficult using classical clustering algorithms since we have to make a suitable setup of the hyperparameters of the algorithms and deal with outliers. It should be addressed correctly to help urban planners in the decision-making process for the further development of a big city. For instance, experts' main interest in criminology is comprehending the relationship between crimes and the socio-economic characteristics at specific georeferenced locations. In addition, the classical clustering algorithms take little notice of the intricate spatial correlations in georeferenced data sources. This paper presents a new approach to detecting the most relevant urban patterns from multiple data sources based on tensor decomposition. Compared to classical methods, the proposed approach's performance is attested to validate the identified patterns' quality. The result indicates that the approach can effectively identify functional patterns to characterize the data set for further analysis in achieving good clustering quality. Furthermore, we developed a generic framework named TensorAnalyzer, where the effectiveness and usefulness of the proposed methodology are tested by a set of experiments and a real-world case study showing the relationship between the crime events around schools and students performance and other variables involved in the analysis.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Calibrate: Interactive Analysis of Probabilistic Model Output
Authors:
Peter Xenopoulos,
Joao Rulff,
Luis Gustavo Nonato,
Brian Barr,
Claudio Silva
Abstract:
Analyzing classification model performance is a crucial task for machine learning practitioners. While practitioners often use count-based metrics derived from confusion matrices, like accuracy, many applications, such as weather prediction, sports betting, or patient risk prediction, rely on a classifier's predicted probabilities rather than predicted labels. In these instances, practitioners are…
▽ More
Analyzing classification model performance is a crucial task for machine learning practitioners. While practitioners often use count-based metrics derived from confusion matrices, like accuracy, many applications, such as weather prediction, sports betting, or patient risk prediction, rely on a classifier's predicted probabilities rather than predicted labels. In these instances, practitioners are concerned with producing a calibrated model, that is, one which outputs probabilities that reflect those of the true distribution. Model calibration is often analyzed visually, through static reliability diagrams, however, the traditional calibration visualization may suffer from a variety of drawbacks due to the strong aggregations it necessitates. Furthermore, count-based approaches are unable to sufficiently analyze model calibration. We present Calibrate, an interactive reliability diagram that addresses the aforementioned issues. Calibrate constructs a reliability diagram that is resistant to drawbacks in traditional approaches, and allows for interactive subgroup analysis and instance-level inspection. We demonstrate the utility of Calibrate through use cases on both real-world and synthetic data. We further validate Calibrate by presenting the results of a think-aloud experiment with data scientists who routinely analyze model calibration.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
LegalVis: Exploring and Inferring Precedent Citations in Legal Documents
Authors:
Lucas E. Resck,
Jean R. Ponciano,
Luis Gustavo Nonato,
Jorge Poco
Abstract:
To reduce the number of pending cases and conflicting rulings in the Brazilian Judiciary, the National Congress amended the Constitution, allowing the Brazilian Supreme Court (STF) to create binding precedents (BPs), i.e., a set of understandings that both Executive and lower Judiciary branches must follow. The STF's justices frequently cite the 58 existing BPs in their decisions, and it is of pri…
▽ More
To reduce the number of pending cases and conflicting rulings in the Brazilian Judiciary, the National Congress amended the Constitution, allowing the Brazilian Supreme Court (STF) to create binding precedents (BPs), i.e., a set of understandings that both Executive and lower Judiciary branches must follow. The STF's justices frequently cite the 58 existing BPs in their decisions, and it is of primary relevance that judicial experts could identify and analyze such citations. To assist in this problem, we propose LegalVis, a web-based visual analytics system designed to support the analysis of legal documents that cite or could potentially cite a BP. We model the problem of identifying potential citations (i.e., non-explicit) as a classification problem. However, a simple score is not enough to explain the results; that is why we use an interpretability machine learning method to explain the reason behind each identified citation. For a compelling visual exploration of documents and BPs, LegalVis comprises three interactive visual components: the first presents an overview of the data showing temporal patterns, the second allows filtering and grouping relevant documents by topic, and the last one shows a document's text aiming to interpret the model's output by pointing out which paragraphs are likely to mention the BP, even if not explicitly specified. We evaluated our identification model and obtained an accuracy of 96%; we also made a quantitative and qualitative analysis of the results. The usefulness and effectiveness of LegalVis were evaluated through two usage scenarios and feedback from six domain experts.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
Topological Representations of Local Explanations
Authors:
Peter Xenopoulos,
Gromit Chan,
Harish Doraiswamy,
Luis Gustavo Nonato,
Brian Barr,
Claudio Silva
Abstract:
Local explainability methods -- those which seek to generate an explanation for each prediction -- are becoming increasingly prevalent due to the need for practitioners to rationalize their model outputs. However, comparing local explainability methods is difficult since they each generate outputs in various scales and dimensions. Furthermore, due to the stochastic nature of some explainability me…
▽ More
Local explainability methods -- those which seek to generate an explanation for each prediction -- are becoming increasingly prevalent due to the need for practitioners to rationalize their model outputs. However, comparing local explainability methods is difficult since they each generate outputs in various scales and dimensions. Furthermore, due to the stochastic nature of some explainability methods, it is possible for different runs of a method to produce contradictory explanations for a given observation. In this paper, we propose a topology-based framework to extract a simplified representation from a set of local explanations. We do so by first modeling the relationship between the explanation space and the model predictions as a scalar function. Then, we compute the topological skeleton of this function. This topological skeleton acts as a signature for such functions, which we use to compare different explanation methods. We demonstrate that our framework can not only reliably identify differences between explainability techniques but also provides stable representations. Then, we show how our framework can be used to identify appropriate parameters for local explainability methods. Our framework is simple, does not require complex optimizations, and can be broadly applied to most local explanation methods. We believe the practicality and versatility of our approach will help promote topology-based approaches as a tool for understanding and comparing explanation methods.
△ Less
Submitted 6 January, 2022;
originally announced January 2022.
-
TopoMap: A 0-dimensional Homology Preserving Projection of High-Dimensional Data
Authors:
Harish Doraiswamy,
Julien Tierny,
Paulo J. S. Silva,
Luis Gustavo Nonato,
Claudio Silva
Abstract:
Multidimensional Projection is a fundamental tool for high-dimensional data analytics and visualization. With very few exceptions, projection techniques are designed to map data from a high-dimensional space to a visual space so as to preserve some dissimilarity (similarity) measure, such as the Euclidean distance for example. In fact, although adopting distinct mathematical formulations designed…
▽ More
Multidimensional Projection is a fundamental tool for high-dimensional data analytics and visualization. With very few exceptions, projection techniques are designed to map data from a high-dimensional space to a visual space so as to preserve some dissimilarity (similarity) measure, such as the Euclidean distance for example. In fact, although adopting distinct mathematical formulations designed to favor different aspects of the data, most multidimensional projection methods strive to preserve dissimilarity measures that encapsulate geometric properties such as distances or the proximity relation between data objects. However, geometric relations are not the only interesting property to be preserved in a projection. For instance, the analysis of particular structures such as clusters and outliers could be more reliably performed if the mapping process gives some guarantee as to topological invariants such as connected components and loops. This paper introduces TopoMap, a novel projection technique which provides topological guarantees during the mapping process. In particular, the proposed method performs the mapping from a high-dimensional space to a visual space, while preserving the 0-dimensional persistence diagram of the Rips filtration of the high-dimensional data, ensuring that the filtrations generate the same connected components when applied to the original as well as projected data. The presented case studies show that the topological guarantee provided by TopoMap not only brings confidence to the visual analytic process but also can be used to assist in the assessment of other projection methods.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Melody: Generating and Visualizing Machine Learning Model Summary to Understand Data and Classifiers Together
Authors:
Gromit Yeuk-Yin Chan,
Enrico Bertini,
Luis Gustavo Nonato,
Brian Barr,
Claudio T. Silva
Abstract:
With the increasing sophistication of machine learning models, there are growing trends of developing model explanation techniques that focus on only one instance (local explanation) to ensure faithfulness to the original model. While these techniques provide accurate model interpretability on various data primitive (e.g., tabular, image, or text), a holistic Explainable Artificial Intelligence (X…
▽ More
With the increasing sophistication of machine learning models, there are growing trends of developing model explanation techniques that focus on only one instance (local explanation) to ensure faithfulness to the original model. While these techniques provide accurate model interpretability on various data primitive (e.g., tabular, image, or text), a holistic Explainable Artificial Intelligence (XAI) experience also requires a global explanation of the model and dataset to enable sensemaking in different granularity. Thus, there is a vast potential in synergizing the model explanation and visual analytics approaches. In this paper, we present MELODY, an interactive algorithm to construct an optimal global overview of the model and data behavior by summarizing the local explanations using information theory. The result (i.e., an explanation summary) does not require additional learning models, restrictions of data primitives, or the knowledge of machine learning from the users. We also design MELODY UI, an interactive visual analytics system to demonstrate how the explanation summary connects the dots in various XAI tasks from a global overview to local inspections. We present three usage scenarios regarding tabular, image, and text classifications to illustrate how to generalize model interpretability of different data. Our experiments show that our approaches: (1) provides a better explanation summary compared to a straightforward information-theoretic summarization and (2) achieves a significant speedup in the end-to-end data modeling pipeline.
△ Less
Submitted 21 July, 2020;
originally announced July 2020.
-
SUBPLEX: Towards a Better Understanding of Black Box Model Explanations at the Subpopulation Level
Authors:
Jun Yuan,
Gromit Yeuk-Yin Chan,
Brian Barr,
Kyle Overton,
Kim Rees,
Luis Gustavo Nonato,
Enrico Bertini,
Claudio T. Silva
Abstract:
Understanding the interpretation of machine learning (ML) models has been of paramount importance when making decisions with societal impacts such as transport control, financial activities, and medical diagnosis. While current model interpretation methodologies focus on using locally linear functions to approximate the models or creating self-explanatory models that give explanations to each inpu…
▽ More
Understanding the interpretation of machine learning (ML) models has been of paramount importance when making decisions with societal impacts such as transport control, financial activities, and medical diagnosis. While current model interpretation methodologies focus on using locally linear functions to approximate the models or creating self-explanatory models that give explanations to each input instance, they do not focus on model interpretation at the subpopulation level, which is the understanding of model interpretations across different subset aggregations in a dataset. To address the challenges of providing explanations of an ML model across the whole dataset, we propose SUBPLEX, a visual analytics system to help users understand black-box model explanations with subpopulation visual analysis. SUBPLEX is designed through an iterative design process with machine learning researchers to address three usage scenarios of real-life machine learning tasks: model debugging, feature selection, and bias detection. The system applies novel subpopulation analysis on ML model explanations and interactive visualization to explore the explanations on a dataset with different levels of granularity. Based on the system, we conduct user evaluation to assess how understanding the interpretation at a subpopulation level influences the sense-making process of interpreting ML models from a user's perspective. Our results suggest that by providing model explanations for different groups of data, SUBPLEX encourages users to generate more ingenious ideas to enrich the interpretations. It also helps users to acquire a tight integration between programming workflow and visual analytics workflow. Last but not least, we summarize the considerations observed in applying visualization to machine learning interpretations.
△ Less
Submitted 5 May, 2024; v1 submitted 21 July, 2020;
originally announced July 2020.
-
GLoG: Laplacian of Gaussian for Spatial Pattern Detection in Spatio-Temporal Data
Authors:
Luis Gustavo Nonato,
Fabiano Petronetto e Claudio Silva
Abstract:
Boundary detection has long been a fundamental tool for image processing and computer vision, supporting the analysis of static and time-varying data. In this work, we built upon the theory of Graph Signal Processing to propose a novel boundary detection filter in the context of graphs, having as main application scenario the visual analysis of spatio-temporal data. More specifically, we propose t…
▽ More
Boundary detection has long been a fundamental tool for image processing and computer vision, supporting the analysis of static and time-varying data. In this work, we built upon the theory of Graph Signal Processing to propose a novel boundary detection filter in the context of graphs, having as main application scenario the visual analysis of spatio-temporal data. More specifically, we propose the equivalent for graphs of the so-called Laplacian of Gaussian edge detection filter, which is widely used in image processing. The proposed filter is able to reveal interesting spatial patterns while still enabling the definition of entropy of time slices. The entropy reveals the degree of randomness of a time slice, helping users to identify expected and unexpected phenomena over time. The effectiveness of our approach appears in applications involving synthetic and real data sets, which show that the proposed methodology is able to uncover interesting spatial and temporal phenomena. The provided examples and case studies make clear the usefulness of our approach as a mechanism to support visual analytic tasks involving spatio-temporal data.
△ Less
Submitted 9 September, 2019;
originally announced September 2019.
-
Motion Browser: Visualizing and Understanding Complex Upper Limb Movement Under Obstetrical Brachial Plexus Injuries
Authors:
Gromit Yeuk-Yin Chan,
Luis Gustavo Nonato,
Alice Chu,
Preeti Raghavan,
Viswanath Aluru,
Claudio T. Silva
Abstract:
The brachial plexus is a complex network of peripheral nerves that enables sensing from and control of the movements of the arms and hand. Nowadays, the coordination between the muscles to generate simple movements is still not well understood, hindering the knowledge of how to best treat patients with this type of peripheral nerve injury. To acquire enough information for medical data analysis, p…
▽ More
The brachial plexus is a complex network of peripheral nerves that enables sensing from and control of the movements of the arms and hand. Nowadays, the coordination between the muscles to generate simple movements is still not well understood, hindering the knowledge of how to best treat patients with this type of peripheral nerve injury. To acquire enough information for medical data analysis, physicians conduct motion analysis assessments with patients to produce a rich dataset of electromyographic signals from multiple muscles recorded with joint movements during real-world tasks. However, tools for the analysis and visualization of the data in a succinct and interpretable manner are currently not available. Without the ability to integrate, compare, and compute multiple data sources in one platform, physicians can only compute simple statistical values to describe patient's behavior vaguely, which limits the possibility to answer clinical questions and generate hypotheses for research. To address this challenge, we have developed \systemname, an interactive visual analytics system which provides an efficient framework to extract and compare muscle activity patterns from the patient's limbs and coordinated views to help users analyze muscle signals, motion data, and video information to address different tasks. The system was developed as a result of a collaborative endeavor between computer scientists and orthopedic surgery and rehabilitation physicians. We present case studies showing physicians can utilize the information displayed to understand how individuals coordinate their muscles to initiate appropriate treatment and generate new hypotheses for future research.
△ Less
Submitted 22 July, 2019;
originally announced July 2019.