-
ChartQA-X: Generating Explanations for Charts
Authors:
Shamanthak Hegde,
Pooyan Fazli,
Hasti Seifi
Abstract:
The ability to interpret and explain complex information from visual data in charts is crucial for data-driven decision-making. In this work, we address the challenge of providing explanations alongside answering questions about chart images. We present ChartQA-X, a comprehensive dataset comprising various chart types with 28,299 contextually relevant questions, answers, and detailed explanations.…
▽ More
The ability to interpret and explain complex information from visual data in charts is crucial for data-driven decision-making. In this work, we address the challenge of providing explanations alongside answering questions about chart images. We present ChartQA-X, a comprehensive dataset comprising various chart types with 28,299 contextually relevant questions, answers, and detailed explanations. These explanations are generated by prompting six different models and selecting the best responses based on metrics such as faithfulness, informativeness, coherence, and perplexity. Our experiments show that models fine-tuned on our dataset for explanation generation achieve superior performance across various metrics and demonstrate improved accuracy in question-answering tasks on new datasets. By integrating answers with explanatory narratives, our approach enhances the ability of intelligent agents to convey complex information effectively, improve user understanding, and foster trust in the generated responses.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Text Entry for XR Trove (TEXT): Collecting and Analyzing Techniques for Text Input in XR
Authors:
Arpit Bhatia,
Moaaz Hudhud Mughrabi,
Diar Abdlkarim,
Massimiliano Di Luca,
Mar Gonzalez-Franco,
Karan Ahuja,
Hasti Seifi
Abstract:
Text entry for extended reality (XR) is far from perfect, and a variety of text entry techniques (TETs) have been proposed to fit various contexts of use. However, comparing between TETs remains challenging due to the lack of a consolidated collection of techniques, and limited understanding of how interaction attributes of a technique (e.g., presence of visual feedback) impact user performance. T…
▽ More
Text entry for extended reality (XR) is far from perfect, and a variety of text entry techniques (TETs) have been proposed to fit various contexts of use. However, comparing between TETs remains challenging due to the lack of a consolidated collection of techniques, and limited understanding of how interaction attributes of a technique (e.g., presence of visual feedback) impact user performance. To address these gaps, this paper examines the current landscape of XR TETs by creating a database of 176 different techniques. We analyze this database to highlight trends in the design of these techniques, the metrics used to evaluate them, and how various interaction attributes impact these metrics. We discuss implications for future techniques and present TEXT: Text Entry for XR Trove, an interactive online tool to navigate our database.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
VideoA11y: Method and Dataset for Accessible Video Description
Authors:
Chaoyu Li,
Sid Padmanabhuni,
Maryam Cheema,
Hasti Seifi,
Pooyan Fazli
Abstract:
Video descriptions are crucial for blind and low vision (BLV) users to access visual content. However, current artificial intelligence models for generating descriptions often fall short due to limitations in the quality of human annotations within training datasets, resulting in descriptions that do not fully meet BLV users' needs. To address this gap, we introduce VideoA11y, an approach that lev…
▽ More
Video descriptions are crucial for blind and low vision (BLV) users to access visual content. However, current artificial intelligence models for generating descriptions often fall short due to limitations in the quality of human annotations within training datasets, resulting in descriptions that do not fully meet BLV users' needs. To address this gap, we introduce VideoA11y, an approach that leverages multimodal large language models (MLLMs) and video accessibility guidelines to generate descriptions tailored for BLV individuals. Using this method, we have curated VideoA11y-40K, the largest and most comprehensive dataset of 40,000 videos described for BLV users. Rigorous experiments across 15 video categories, involving 347 sighted participants, 40 BLV participants, and seven professional describers, showed that VideoA11y descriptions outperform novice human annotations and are comparable to trained human annotations in clarity, accuracy, objectivity, descriptiveness, and user satisfaction. We evaluated models on VideoA11y-40K using both standard and custom metrics, demonstrating that MLLMs fine-tuned on this dataset produce high-quality accessible descriptions. Code and dataset are available at https://people-robots.github.io/VideoA11y.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Can a Machine Feel Vibrations?: A Framework for Vibrotactile Sensation and Emotion Prediction via a Neural Network
Authors:
Chungman Lim,
Gyeongdeok Kim,
Su-Yeon Kang,
Hasti Seifi,
Gunhyuk Park
Abstract:
Vibrotactile signals offer new possibilities for conveying sensations and emotions in various applications. Yet, designing vibrotactile tactile icons (i.e., Tactons) to evoke specific feelings often requires a trial-and-error process and user studies. To support haptic design, we propose a framework for predicting sensory and emotional ratings from vibration signals. We created 154 Tactons and con…
▽ More
Vibrotactile signals offer new possibilities for conveying sensations and emotions in various applications. Yet, designing vibrotactile tactile icons (i.e., Tactons) to evoke specific feelings often requires a trial-and-error process and user studies. To support haptic design, we propose a framework for predicting sensory and emotional ratings from vibration signals. We created 154 Tactons and conducted a study to collect acceleration data from smartphones and roughness, valence, and arousal user ratings (n=36). We converted the Tacton signals into two-channel spectrograms reflecting the spectral sensitivities of mechanoreceptors, then input them into VibNet, our dual-stream neural network. The first stream captures sequential features using recurrent networks, while the second captures temporal-spectral features using 2D convolutional networks. VibNet outperformed baseline models, with 82% of its predictions falling within the standard deviations of ground truth user ratings for two new Tacton sets. We discuss the efficacy of our mechanoreceptive processing and dual-stream neural network and present future research directions.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Describe Now: User-Driven Audio Description for Blind and Low Vision Individuals
Authors:
Maryam Cheema,
Hasti Seifi,
Pooyan Fazli
Abstract:
Audio descriptions (AD) make videos accessible for blind and low vision (BLV) users by describing visual elements that cannot be understood from the main audio track. AD created by professionals or novice describers is time-consuming and offers little customization or control to BLV viewers on description length and content and when they receive it. To address this gap, we explore user-driven AI-g…
▽ More
Audio descriptions (AD) make videos accessible for blind and low vision (BLV) users by describing visual elements that cannot be understood from the main audio track. AD created by professionals or novice describers is time-consuming and offers little customization or control to BLV viewers on description length and content and when they receive it. To address this gap, we explore user-driven AI-generated descriptions, enabling BLV viewers to control both the timing and level of detail of the descriptions they receive. In a study, 20 BLV participants activated audio descriptions for seven different video genres with two levels of detail: concise and detailed. Our findings reveal differences in the preferred frequency and level of detail of ADs for different videos, participants' sense of control with this style of AD delivery, and its limitations. We discuss the implications of these findings for the development of future AD tools for BLV users.
△ Less
Submitted 27 May, 2025; v1 submitted 18 November, 2024;
originally announced November 2024.
-
Grounding Emotional Descriptions to Electrovibration Haptic Signals
Authors:
Guimin Hu,
Zirui Zhao,
Lukas Heilmann,
Yasemin Vardar,
Hasti Seifi
Abstract:
Designing and displaying haptic signals with sensory and emotional attributes can improve the user experience in various applications. Free-form user language provides rich sensory and emotional information for haptic design (e.g., ``This signal feels smooth and exciting''), but little work exists on linking user descriptions to haptic signals (i.e., language grounding). To address this gap, we co…
▽ More
Designing and displaying haptic signals with sensory and emotional attributes can improve the user experience in various applications. Free-form user language provides rich sensory and emotional information for haptic design (e.g., ``This signal feels smooth and exciting''), but little work exists on linking user descriptions to haptic signals (i.e., language grounding). To address this gap, we conducted a study where 12 users described the feel of 32 signals perceived on a surface haptics (i.e., electrovibration) display. We developed a computational pipeline using natural language processing (NLP) techniques, such as GPT-3.5 Turbo and word embedding methods, to extract sensory and emotional keywords and group them into semantic clusters (i.e., concepts). We linked the keyword clusters to haptic signal features (e.g., pulse count) using correlation analysis. The proposed pipeline demonstrates the viability of a computational approach to analyzing haptic experiences. We discuss our future plans for creating a predictive model of haptic experience.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
Retrieving Implicit and Explicit Emotional Events Using Large Language Models
Authors:
Guimin Hu,
Hasti Seifi
Abstract:
Large language models (LLMs) have garnered significant attention in recent years due to their impressive performance. While considerable research has evaluated these models from various perspectives, the extent to which LLMs can perform implicit and explicit emotion retrieval remains largely unexplored. To address this gap, this study investigates LLMs' emotion retrieval capabilities in commonsens…
▽ More
Large language models (LLMs) have garnered significant attention in recent years due to their impressive performance. While considerable research has evaluated these models from various perspectives, the extent to which LLMs can perform implicit and explicit emotion retrieval remains largely unexplored. To address this gap, this study investigates LLMs' emotion retrieval capabilities in commonsense. Through extensive experiments involving multiple models, we systematically evaluate the ability of LLMs on emotion retrieval. Specifically, we propose a supervised contrastive probing method to verify LLMs' performance for implicit and explicit emotion retrieval, as well as the diversity of the emotional events they retrieve. The results offer valuable insights into the strengths and limitations of LLMs in handling emotion retrieval.
△ Less
Submitted 1 December, 2024; v1 submitted 24 October, 2024;
originally announced October 2024.
-
Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Authors:
Guimin Hu,
Yi Xin,
Weimin Lyu,
Haojian Huang,
Chang Sun,
Zhihong Zhu,
Lin Gui,
Ruichu Cai,
Erik Cambria,
Hasti Seifi
Abstract:
Multimodal affective computing (MAC) has garnered increasing attention due to its broad applications in analyzing human behaviors and intentions, especially in text-dominated multimodal affective computing field. This survey presents the recent trends of multimodal affective computing from NLP perspective through four hot tasks: multimodal sentiment analysis, multimodal emotion recognition in conv…
▽ More
Multimodal affective computing (MAC) has garnered increasing attention due to its broad applications in analyzing human behaviors and intentions, especially in text-dominated multimodal affective computing field. This survey presents the recent trends of multimodal affective computing from NLP perspective through four hot tasks: multimodal sentiment analysis, multimodal emotion recognition in conversation, multimodal aspect-based sentiment analysis and multimodal multi-label emotion recognition. The goal of this survey is to explore the current landscape of multimodal affective research, identify development trends, and highlight the similarities and differences across various tasks, offering a comprehensive report on the recent progress in multimodal affective computing from an NLP perspective. This survey covers the formalization of tasks, provides an overview of relevant works, describes benchmark datasets, and details the evaluation metrics for each task. Additionally, it briefly discusses research in multimodal affective computing involving facial expressions, acoustic signals, physiological signals, and emotion causes. Additionally, we discuss the technical approaches, challenges, and future directions in multimodal affective computing. To support further research, we released a repository that compiles related works in multimodal affective computing, providing detailed resources and references for the community.
△ Less
Submitted 30 October, 2024; v1 submitted 11 September, 2024;
originally announced September 2024.
-
Hovering Over the Key to Text Input in XR
Authors:
Mar Gonzalez-Franco,
Diar Abdlkarim,
Arpit Bhatia,
Stuart Macgregor,
Jason Alexander Fotso-Puepi,
Eric J Gonzalez,
Hasti Seifi,
Massimiliano Di Luca,
Karan Ahuja
Abstract:
Virtual, Mixed, and Augmented Reality (XR) technologies hold immense potential for transforming productivity beyond PC. Therefore there is a critical need for improved text input solutions for XR. However, achieving efficient text input in these environments remains a significant challenge. This paper examines the current landscape of XR text input techniques, focusing on the importance of keyboar…
▽ More
Virtual, Mixed, and Augmented Reality (XR) technologies hold immense potential for transforming productivity beyond PC. Therefore there is a critical need for improved text input solutions for XR. However, achieving efficient text input in these environments remains a significant challenge. This paper examines the current landscape of XR text input techniques, focusing on the importance of keyboards (both physical and virtual) as essential tools. We discuss the unique challenges and opportunities presented by XR, synthesizing key trends from existing solutions.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
An Interactive Tool for Simulating Mid-Air Ultrasound Tactons on the Skin
Authors:
Chungman Lim,
Hasti Seifi,
Gunhyuk Park
Abstract:
Mid-air ultrasound haptic technology offers a myriad of temporal and spatial parameters for contactless haptic design. Yet, predicting how these parameters interact to render an ultrasound signal is difficult before testing them on a mid-air ultrasound haptic device. Thus, haptic designers often use a trial-and-error process with different parameter combinations to obtain desired tactile patterns…
▽ More
Mid-air ultrasound haptic technology offers a myriad of temporal and spatial parameters for contactless haptic design. Yet, predicting how these parameters interact to render an ultrasound signal is difficult before testing them on a mid-air ultrasound haptic device. Thus, haptic designers often use a trial-and-error process with different parameter combinations to obtain desired tactile patterns (i.e., Tactons) for user applications. We propose an interactive tool with five temporal and three spatiotemporal design parameters that can simulate the temporal and spectral properties of stimulation at specific skin points. As a preliminary verification, we measured vibrations induced from the ultrasound Tactons varying on one temporal and two spatiotemporal parameters. The measurements and simulation showed similar results for three different ultrasound rendering techniques, suggesting the efficacy of the simulation tool. We present key insights from the simulation and discuss future directions for enhancing the capabilities of simulations.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Designing Distinguishable Mid-Air Ultrasound Tactons with Temporal Parameters
Authors:
Chungman Lim,
Gunhyuk Park,
Hasti Seifi
Abstract:
Mid-air ultrasound technology offers new design opportunities for contactless tactile patterns (i.e., Tactons) in user applications. Yet, few guidelines exist for making ultrasound Tactons easy to distinguish for users. In this paper, we investigated the distinguishability of temporal parameters of ultrasound Tactons in five studies (n=72 participants). Study 1 established the discrimination thres…
▽ More
Mid-air ultrasound technology offers new design opportunities for contactless tactile patterns (i.e., Tactons) in user applications. Yet, few guidelines exist for making ultrasound Tactons easy to distinguish for users. In this paper, we investigated the distinguishability of temporal parameters of ultrasound Tactons in five studies (n=72 participants). Study 1 established the discrimination thresholds for amplitude-modulated (AM) frequencies. In Studies 2-5, we investigated distinguishable ultrasound Tactons by creating four Tacton sets based on mechanical vibrations in the literature and collected similarity ratings for the ultrasound Tactons. We identified a subset of temporal parameters, such as rhythm and low envelope frequency, that could create distinguishable ultrasound Tactons. Also, a strong correlation (mean Spearman's $ρ$=0.75) existed between similarity ratings for ultrasound Tactons and similarities of mechanical Tactons from the literature, suggesting vibrotactile designers can transfer their knowledge to ultrasound design. We present design guidelines and future directions for creating distinguishable mid-air ultrasound Tactons.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
AdapTics: A Toolkit for Creative Design and Integration of Real-Time Adaptive Mid-Air Ultrasound Tactons
Authors:
Kevin John,
Yinan Li,
Hasti Seifi
Abstract:
Mid-air ultrasound haptic technology can enhance user interaction and immersion in extended reality (XR) applications through contactless touch feedback. Yet, existing design tools for mid-air haptics primarily support creating tactile sensations (i.e., tactons) which cannot change at runtime. These tactons lack expressiveness in interactive scenarios where a continuous closed-loop response to use…
▽ More
Mid-air ultrasound haptic technology can enhance user interaction and immersion in extended reality (XR) applications through contactless touch feedback. Yet, existing design tools for mid-air haptics primarily support creating tactile sensations (i.e., tactons) which cannot change at runtime. These tactons lack expressiveness in interactive scenarios where a continuous closed-loop response to user movement or environmental states is desirable. This paper introduces AdapTics, a toolkit featuring a graphical interface for rapid prototyping of adaptive tactons-dynamic sensations that can adjust at runtime based on user interactions, environmental changes, or other inputs. A software library and a Unity package accompany the graphical interface to enable integration of adaptive tactons in existing applications. We present the design space offered by AdapTics for creating adaptive mid-air ultrasound tactons and show the design tool can improve Creativity Support Index ratings for Exploration and Expressiveness in a user study with 12 XR and haptic designers.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause
Authors:
Guimin Hu,
Zhihong Zhu,
Daniel Hershcovich,
Lijie Hu,
Hasti Seifi,
Jiayuan Xie
Abstract:
Multimodal emotion recognition in conversation (MERC) and multimodal emotion-cause pair extraction (MECPE) have recently garnered significant attention. Emotions are the expression of affect or feelings; responses to specific events, or situations -- known as emotion causes. Both collectively explain the causality between human emotion and intents. However, existing works treat emotion recognition…
▽ More
Multimodal emotion recognition in conversation (MERC) and multimodal emotion-cause pair extraction (MECPE) have recently garnered significant attention. Emotions are the expression of affect or feelings; responses to specific events, or situations -- known as emotion causes. Both collectively explain the causality between human emotion and intents. However, existing works treat emotion recognition and emotion cause extraction as two individual problems, ignoring their natural causality. In this paper, we propose a Unified Multimodal Emotion recognition and Emotion-Cause analysis framework (UniMEEC) to explore the causality between emotion and emotion cause. Concretely, UniMEEC reformulates the MERC and MECPE tasks as mask prediction problems and unifies them with a causal prompt template. To differentiate the modal effects, UniMEEC proposes a multimodal causal prompt to probe the pre-trained knowledge specified to modality and implements cross-task and cross-modality interactions under task-oriented settings. Experiment results on four public benchmark datasets verify the model performance on MERC and MECPE tasks and achieve consistent improvements compared with the previous state-of-the-art methods.
△ Less
Submitted 9 October, 2024; v1 submitted 30 March, 2024;
originally announced April 2024.
-
Clustering Social Touch Gestures for Human-Robot Interaction
Authors:
Ramzi Abou Chahine,
Steven Vasquez,
Pooyan Fazli,
Hasti Seifi
Abstract:
Social touch provides a rich non-verbal communication channel between humans and robots. Prior work has identified a set of touch gestures for human-robot interaction and described them with natural language labels (e.g., stroking, patting). Yet, no data exists on the semantic relationships between the touch gestures in users' minds. To endow robots with touch intelligence, we investigated how peo…
▽ More
Social touch provides a rich non-verbal communication channel between humans and robots. Prior work has identified a set of touch gestures for human-robot interaction and described them with natural language labels (e.g., stroking, patting). Yet, no data exists on the semantic relationships between the touch gestures in users' minds. To endow robots with touch intelligence, we investigated how people perceive the similarities of social touch labels from the literature. In an online study, 45 participants grouped 36 social touch labels based on their perceived similarities and annotated their groupings with descriptive names. We derived quantitative similarities of the gestures from these groupings and analyzed the similarities using hierarchical clustering. The analysis resulted in 9 clusters of touch gestures formed around the social, emotional, and contact characteristics of the gestures. We discuss the implications of our results for designing and evaluating touch sensing and interactions with social robots.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Charting Visual Impression of Robot Hands
Authors:
Hasti Seifi,
Steven A. Vasquez,
Hyunyoung Kim,
Pooyan Fazli
Abstract:
A wide variety of robotic hands have been designed to date. Yet, we do not know how users perceive these hands and feel about interacting with them. To inform hand design for social robots, we compiled a dataset of 73 robot hands and ran an online study, in which 160 users rated their impressions of the hands using 17 rating scales. Next, we developed 17 regression models that can predict user rat…
▽ More
A wide variety of robotic hands have been designed to date. Yet, we do not know how users perceive these hands and feel about interacting with them. To inform hand design for social robots, we compiled a dataset of 73 robot hands and ran an online study, in which 160 users rated their impressions of the hands using 17 rating scales. Next, we developed 17 regression models that can predict user ratings (e.g., humanlike) from the design features of the hands (e.g., number of fingers). The models have less than a 10-point error in predicting the user ratings on a 0-100 scale. The shape of the fingertips, color scheme, and size of the hands influence the user ratings the most. We present simple guidelines to improve user impression of robot hands and outline remaining questions for future work.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
In the Arms of a Robot: Designing Autonomous Hugging Robots with Intra-Hug Gestures
Authors:
Alexis E. Block,
Hasti Seifi,
Otmar Hilliges,
Roger Gassert,
Katherine J. Kuchenbecker
Abstract:
Hugs are complex affective interactions that often include gestures like squeezes. We present six new guidelines for designing interactive hugging robots, which we validate through two studies with our custom robot. To achieve autonomy, we investigated robot responses to four human intra-hug gestures: holding, rubbing, patting, and squeezing. Thirty-two users each exchanged and rated sixteen hugs…
▽ More
Hugs are complex affective interactions that often include gestures like squeezes. We present six new guidelines for designing interactive hugging robots, which we validate through two studies with our custom robot. To achieve autonomy, we investigated robot responses to four human intra-hug gestures: holding, rubbing, patting, and squeezing. Thirty-two users each exchanged and rated sixteen hugs with an experimenter-controlled HuggieBot 2.0. The robot's inflated torso's microphone and pressure sensor collected data of the subjects' demonstrations that were used to develop a perceptual algorithm that classifies user actions with 88\% accuracy. Users enjoyed robot squeezes, regardless of their performed action, they valued variety in the robot response, and they appreciated robot-initiated intra-hug gestures. From average user ratings, we created a probabilistic behavior algorithm that chooses robot responses in real time. We implemented improvements to the robot platform to create HuggieBot 3.0 and then validated its gesture perception system and behavior algorithm with sixteen users. The robot's responses and proactive gestures were greatly enjoyed. Users found the robot more natural, enjoyable, and intelligent in the last phase of the experiment than in the first. After the study, they felt more understood by the robot and thought robots were nicer to hug.
△ Less
Submitted 20 February, 2022;
originally announced February 2022.