-
Data Analysis in the Era of Generative AI
Authors:
Jeevana Priya Inala,
Chenglong Wang,
Steven Drucker,
Gonzalo Ramos,
Victor Dibia,
Nathalie Riche,
Dave Brown,
Dan Marshall,
Jianfeng Gao
Abstract:
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges. We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow by translating high-level user intentions into executable code, charts, and insights. We then examine human-centered design…
▽ More
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges. We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow by translating high-level user intentions into executable code, charts, and insights. We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps. Finally, we discuss the research challenges that impede the development of these AI-based systems such as enhancing model capabilities, evaluating and benchmarking, and understanding end-user needs.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the Way
Authors:
Chenglong Wang,
Bongshin Lee,
Steven Drucker,
Dan Marshall,
Jianfeng Gao
Abstract:
Data analysts often need to iterate between data transformations and chart designs to create rich visualizations for exploratory data analysis. Although many AI-powered systems have been introduced to reduce the effort of visualization authoring, existing systems are not well suited for iterative authoring. They typically require analysts to provide, in a single turn, a text-only prompt that fully…
▽ More
Data analysts often need to iterate between data transformations and chart designs to create rich visualizations for exploratory data analysis. Although many AI-powered systems have been introduced to reduce the effort of visualization authoring, existing systems are not well suited for iterative authoring. They typically require analysts to provide, in a single turn, a text-only prompt that fully describe a complex visualization. We introduce Data Formulator 2 (DF2 for short), an AI-powered visualization system designed to overcome this limitation. DF2 blends graphical user interfaces and natural language inputs to enable users to convey their intent more effectively, while delegating data transformation to AI. Furthermore, to support efficient iteration, DF2 lets users navigate their iteration history and reuse previous designs, eliminating the need to start from scratch each time. A user study with eight participants demonstrated that DF2 allowed participants to develop their own iteration styles to complete challenging data exploration sessions.
△ Less
Submitted 20 February, 2025; v1 submitted 28 August, 2024;
originally announced August 2024.
-
How Do Analysts Understand and Verify AI-Assisted Data Analyses?
Authors:
Ken Gu,
Ruoxi Shang,
Tim Althoff,
Chenglong Wang,
Steven M. Drucker
Abstract:
Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to inc…
▽ More
Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to incorrect conclusions. Therefore, validating AI assistance is crucial and challenging. Here, we explore how analysts understand and verify the correctness of AI-generated analyses. To observe analysts in diverse verification approaches, we develop a design probe equipped with natural language explanations, code, visualizations, and interactive data tables with common data operations. Through a qualitative user study (n=22) using this probe, we uncover common behaviors within verification workflows and how analysts' programming, analysis, and tool backgrounds reflect these behaviors. Additionally, we provide recommendations for analysts and highlight opportunities for designers to improve future AI-assistant experiences.
△ Less
Submitted 4 March, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
On the Design of AI-powered Code Assistants for Notebooks
Authors:
Andrew M. McNutt,
Chenglong Wang,
Robert A. DeLine,
Steven M. Drucker
Abstract:
AI-powered code assistants, such as Copilot, are quickly becoming a ubiquitous component of contemporary coding contexts. Among these environments, computational notebooks, such as Jupyter, are of particular interest as they provide rich interface affordances that interleave code and output in a manner that allows for both exploratory and presentational work. Despite their popularity, little is kn…
▽ More
AI-powered code assistants, such as Copilot, are quickly becoming a ubiquitous component of contemporary coding contexts. Among these environments, computational notebooks, such as Jupyter, are of particular interest as they provide rich interface affordances that interleave code and output in a manner that allows for both exploratory and presentational work. Despite their popularity, little is known about the appropriate design of code assistants in notebooks. We investigate the potential of code assistants in computational notebooks by creating a design space (reified from a survey of extant tools) and through an interview-design study (with 15 practicing data scientists). Through this work, we identify challenges and opportunities for future systems in this space, such as the value of disambiguation for tasks like data visualization, the potential of tightly scoped domain-specific tools (like linters), and the importance of polite assistants.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
Application of Stable Inversion to Flexible Manipulators Modeled by the ANCF
Authors:
Svenja Drücker,
Robert Seifried
Abstract:
Compared to conventional robots, flexible manipulators offer many advantages, such as faster end-effector velocities and less energy consumption. However, their flexible structure can lead to undesired oscillations. Therefore, the applied control strategy should account for these elasticities. A feedforward controller based on an inverse model of the system is an efficient way to improve the perfo…
▽ More
Compared to conventional robots, flexible manipulators offer many advantages, such as faster end-effector velocities and less energy consumption. However, their flexible structure can lead to undesired oscillations. Therefore, the applied control strategy should account for these elasticities. A feedforward controller based on an inverse model of the system is an efficient way to improve the performance. However, unstable internal dynamics arise for many common flexible robots and stable inversion must be applied. In this contribution, an approximation of the original stable inversion approach is proposed. The approximation simplifies the problem setup, since the internal dynamics do not need to be derived explicitly for the definition of the boundary conditions. From a practical point of view, this makes the method applicable to more complex systems with many unactuated degrees of freedom. Flexible manipulators modeled by the absolute nodal coordinate formulation (ANCF) are considered as an application example.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations
Authors:
Arjun Srinivasan,
Nikhila Nyapathy,
Bongshin Lee,
Steven M. Drucker,
John Stasko
Abstract:
Natural language interfaces (NLIs) for data visualization are becoming increasingly popular both in academic research and in commercial software. Yet, there is a lack of empirical understanding of how people specify visualizations through natural language. To bridge this gap, we conducted an online study with 102 participants. We showed participants a series of ten visualizations for a given datas…
▽ More
Natural language interfaces (NLIs) for data visualization are becoming increasingly popular both in academic research and in commercial software. Yet, there is a lack of empirical understanding of how people specify visualizations through natural language. To bridge this gap, we conducted an online study with 102 participants. We showed participants a series of ten visualizations for a given dataset and asked them to provide utterances they would pose to generate the displayed charts. The curated list of utterances generated from the study is provided below. This corpus of utterances can be used to evaluate existing NLIs for data visualization as well as for creating new systems and models to generate visualizations from natural language utterances.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
Data Visceralization: Enabling Deeper Understanding of Data Using Virtual Reality
Authors:
Benjamin Lee,
Dave Brown,
Bongshin Lee,
Christophe Hurter,
Steven Drucker,
Tim Dwyer
Abstract:
A fundamental part of data visualization is transforming data to map abstract information onto visual attributes. While this abstraction is a powerful basis for data visualization, the connection between the representation and the original underlying data (i.e., what the quantities and measurements actually correspond with in reality) can be lost. On the other hand, virtual reality (VR) is being i…
▽ More
A fundamental part of data visualization is transforming data to map abstract information onto visual attributes. While this abstraction is a powerful basis for data visualization, the connection between the representation and the original underlying data (i.e., what the quantities and measurements actually correspond with in reality) can be lost. On the other hand, virtual reality (VR) is being increasingly used to represent real and abstract models as natural experiences to users. In this work, we explore the potential of using VR to help restore the basic understanding of units and measures that are often abstracted away in data visualization in an approach we call data visceralization. By building VR prototypes as design probes, we identify key themes and factors for data visceralization. We do this first through a critical reflection by the authors, then by involving external participants. We find that data visceralization is an engaging way of understanding the qualitative aspects of physical measures and their real-life form, which complements analytical and quantitative understanding commonly gained from data visualization. However, data visceralization is most effective when there is a one-to-one mapping between data and representation, with transformations such as scaling affecting this understanding. We conclude with a discussion of future directions for data visceralization.
△ Less
Submitted 11 November, 2020; v1 submitted 31 August, 2020;
originally announced September 2020.
-
InChorus: Designing Consistent Multimodal Interactions for Data Visualization on Tablet Devices
Authors:
Arjun Srinivasan,
Bongshin Lee,
Nathalie Henry Riche,
Steven M. Drucker,
Ken Hinckley
Abstract:
While tablet devices are a promising platform for data visualization, supporting consistent interactions across different types of visualizations on tablets remains an open challenge. In this paper, we present multimodal interactions that function consistently across different visualizations, supporting common operations during visual data analysis. By considering standard interface elements (e.g.…
▽ More
While tablet devices are a promising platform for data visualization, supporting consistent interactions across different types of visualizations on tablets remains an open challenge. In this paper, we present multimodal interactions that function consistently across different visualizations, supporting common operations during visual data analysis. By considering standard interface elements (e.g., axes, marks) and grounding our design in a set of core concepts including operations, parameters, targets, and instruments, we systematically develop interactions applicable to different visualization types. To exemplify how the proposed interactions collectively facilitate data exploration, we employ them in a tablet-based system, InChorus that supports pen, touch, and speech input. Based on a study with 12 participants performing replication and fact-checking tasks with InChorus, we discuss how participants adapted to using multimodal input and highlight considerations for future multimodal visualization systems.
△ Less
Submitted 17 January, 2020;
originally announced January 2020.
-
A System for Real-Time Interactive Analysis of Deep Learning Training
Authors:
Shital Shah,
Roland Fernandez,
Steven Drucker
Abstract:
Performing diagnosis or exploratory analysis during the training of deep learning models is challenging but often necessary for making a sequence of decisions guided by the incremental observations. Currently available systems for this purpose are limited to monitoring only the logged data that must be specified before the training process starts. Each time a new information is desired, a cycle of…
▽ More
Performing diagnosis or exploratory analysis during the training of deep learning models is challenging but often necessary for making a sequence of decisions guided by the incremental observations. Currently available systems for this purpose are limited to monitoring only the logged data that must be specified before the training process starts. Each time a new information is desired, a cycle of stop-change-restart is required in the training process. These limitations make interactive exploration and diagnosis tasks difficult, imposing long tedious iterations during the model development. We present a new system that enables users to perform interactive queries on live processes generating real-time information that can be rendered in multiple formats on multiple surfaces in the form of several desired visualizations simultaneously. To achieve this, we model various exploratory inspection and diagnostic tasks for deep learning training processes as specifications for streams using a map-reduce paradigm with which many data scientists are already familiar. Our design achieves generality and extensibility by defining composable primitives which is a fundamentally different approach than is used by currently available systems. The open source implementation of our system is available as TensorWatch project at https://github.com/microsoft/tensorwatch.
△ Less
Submitted 7 January, 2020; v1 submitted 5 January, 2020;
originally announced January 2020.