-
VAE Explainer: Supplement Learning Variational Autoencoders with Interactive Visualization
Authors:
Donald Bertucci,
Alex Endert
Abstract:
Variational Autoencoders are widespread in Machine Learning, but are typically explained with dense math notation or static code examples. This paper presents VAE Explainer, an interactive Variational Autoencoder running in the browser to supplement existing static documentation (e.g., Keras Code Examples). VAE Explainer adds interactions to the VAE summary with interactive model inputs, latent sp…
▽ More
Variational Autoencoders are widespread in Machine Learning, but are typically explained with dense math notation or static code examples. This paper presents VAE Explainer, an interactive Variational Autoencoder running in the browser to supplement existing static documentation (e.g., Keras Code Examples). VAE Explainer adds interactions to the VAE summary with interactive model inputs, latent space, and output. VAE Explainer connects the high-level understanding with the implementation: annotated code and a live computational graph. The VAE Explainer interactive visualization is live at https://xnought.github.io/vae-explainer and the code is open source at https://github.com/xnought/vae-explainer.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning
Authors:
Ángel Alexander Cabrera,
Erica Fu,
Donald Bertucci,
Kenneth Holstein,
Ameet Talwalkar,
Jason I. Hong,
Adam Perer
Abstract:
Machine learning models with high accuracy on test data can still produce systematic failures, such as harmful biases and safety issues, when deployed in the real world. To detect and mitigate such failures, practitioners run behavioral evaluation of their models, checking model outputs for specific types of inputs. Behavioral evaluation is important but challenging, requiring that practitioners d…
▽ More
Machine learning models with high accuracy on test data can still produce systematic failures, such as harmful biases and safety issues, when deployed in the real world. To detect and mitigate such failures, practitioners run behavioral evaluation of their models, checking model outputs for specific types of inputs. Behavioral evaluation is important but challenging, requiring that practitioners discover real-world patterns and validate systematic failures. We conducted 18 semi-structured interviews with ML practitioners to better understand the challenges of behavioral evaluation and found that it is a collaborative, use-case-first process that is not adequately supported by existing task- and domain-specific tools. Using these findings, we designed Zeno, a general-purpose framework for visualizing and testing AI systems across diverse use cases. In four case studies with participants using Zeno on real-world models, we found that practitioners were able to reproduce previous manual analyses and discover new systematic failures.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL
Authors:
Kin-Ho Lam,
Delyar Tabatabai,
Jed Irvine,
Donald Bertucci,
Anita Ruangrotsakun,
Minsuk Kahng,
Alan Fern
Abstract:
Reinforcement learning (RL) agents are commonly evaluated via their expected value over a distribution of test scenarios. Unfortunately, this evaluation approach provides limited evidence for post-deployment generalization beyond the test distribution. In this paper, we address this limitation by extending the recent CheckList testing methodology from natural language processing to planning-based…
▽ More
Reinforcement learning (RL) agents are commonly evaluated via their expected value over a distribution of test scenarios. Unfortunately, this evaluation approach provides limited evidence for post-deployment generalization beyond the test distribution. In this paper, we address this limitation by extending the recent CheckList testing methodology from natural language processing to planning-based RL. Specifically, we consider testing RL agents that make decisions via online tree search using a learned transition model and value function. The key idea is to improve the assessment of future performance via a CheckList approach for exploring and assessing the agent's inferences during tree search. The approach provides the user with an interface and general query-rule mechanism for identifying potential inference flaws and validating expected inference invariances. We present a user study involving knowledgeable AI researchers using the approach to evaluate an agent trained to play a complex real-time strategy game. The results show the approach is effective in allowing users to identify previously-unknown flaws in the agent's reasoning. In addition, our analysis provides insight into how AI experts use this type of testing approach, which may help improve future instantiations.
△ Less
Submitted 7 June, 2022; v1 submitted 4 June, 2022;
originally announced June 2022.
-
DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps
Authors:
Donald Bertucci,
Md Montaser Hamid,
Yashwanthi Anand,
Anita Ruangrotsakun,
Delyar Tabatabai,
Melissa Perez,
Minsuk Kahng
Abstract:
In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning (ML). ML practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets be…
▽ More
In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning (ML). ML practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap. DendroMap is available at https://div-lab.github.io/dendromap/.
△ Less
Submitted 15 August, 2022; v1 submitted 13 May, 2022;
originally announced May 2022.