-
Communicating Through Avatars in Industry 5.0: A Focus Group Study on Human-Robot Collaboration
Authors:
Stina Klein,
Pooja Prajod,
Katharina Weitz,
Matteo Lavit Nicora,
Dimitra Tsovaltzi,
Elisabeth André
Abstract:
The integration of collaborative robots (cobots) in industrial settings raises concerns about worker well-being, particularly due to reduced social interactions. Avatars - designed to facilitate worker interactions and engagement - are promising solutions to enhance the human-robot collaboration (HRC) experience. However, real-world perspectives on avatar-supported HRC remain unexplored. To addres…
▽ More
The integration of collaborative robots (cobots) in industrial settings raises concerns about worker well-being, particularly due to reduced social interactions. Avatars - designed to facilitate worker interactions and engagement - are promising solutions to enhance the human-robot collaboration (HRC) experience. However, real-world perspectives on avatar-supported HRC remain unexplored. To address this gap, we conducted a focus group study with employees from a German manufacturing company that uses cobots. Before the discussion, participants engaged with a scripted, industry-like HRC demo in a lab setting. This qualitative approach provided valuable insights into the avatar's potential roles, improvements to its behavior, and practical considerations for deploying them in industrial workcells. Our findings also emphasize the importance of personalized communication and task assistance. Although our study's limitations restrict its generalizability, it serves as an initial step in recognizing the potential of adaptive, context-aware avatar interactions in real-world industrial environments.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge
Authors:
Siyang Song,
Micol Spitale,
Xiangyu Kong,
Hengde Zhu,
Cheng Luo,
Cristina Palmero,
German Barquero,
Sergio Escalera,
Michel Valstar,
Mohamed Daoudi,
Tobias Baur,
Fabien Ringeval,
Andrew Howes,
Elisabeth Andre,
Hatice Gunes
Abstract:
In dyadic interactions, a broad spectrum of human facial reactions might be appropriate for responding to each human speaker behaviour. Following the successful organisation of the REACT 2023 and REACT 2024 challenges, we are proposing the REACT 2025 challenge encouraging the development and benchmarking of Machine Learning (ML) models that can be used to generate multiple appropriate, diverse, re…
▽ More
In dyadic interactions, a broad spectrum of human facial reactions might be appropriate for responding to each human speaker behaviour. Following the successful organisation of the REACT 2023 and REACT 2024 challenges, we are proposing the REACT 2025 challenge encouraging the development and benchmarking of Machine Learning (ML) models that can be used to generate multiple appropriate, diverse, realistic and synchronised human-style facial reactions expressed by human listeners in response to an input stimulus (i.e., audio-visual behaviours expressed by their corresponding speakers). As a key of the challenge, we provide challenge participants with the first natural and large-scale multi-modal MAFRG dataset (called MARS) recording 137 human-human dyadic interactions containing a total of 2856 interaction sessions covering five different topics. In addition, this paper also presents the challenge guidelines and the performance of our baselines on the two proposed sub-challenges: Offline MAFRG and Online MAFRG, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline_react2025
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
AI-Based Feedback in Counselling Competence Training of Prospective Teachers
Authors:
Tobias Hallmen,
Kathrin Gietl,
Karoline Hillesheim,
Moritz Bauermann,
Annemarie Friedrich,
Elisabeth André
Abstract:
This study explores the use of AI-based feedback to enhance the counselling competence of prospective teachers. An iterative block seminar was designed, incorporating theoretical foundations, practical applications, and AI tools for analysing verbal, paraverbal, and nonverbal communication. The seminar included recorded simulated teacher-parent conversations, followed by AI-based feedback and qual…
▽ More
This study explores the use of AI-based feedback to enhance the counselling competence of prospective teachers. An iterative block seminar was designed, incorporating theoretical foundations, practical applications, and AI tools for analysing verbal, paraverbal, and nonverbal communication. The seminar included recorded simulated teacher-parent conversations, followed by AI-based feedback and qualitative interviews with students. The study investigated correlations between communication characteristics and conversation quality, student perceptions of AI-based feedback, and the training of AI models to identify conversation phases and techniques. Results indicated significant correlations between nonverbal and paraverbal features and conversation quality, and students positively perceived the AI feedback. The findings suggest that AI-based feedback can provide objective, actionable insights to improve teacher training programs. Future work will focus on refining verbal skill annotations, expanding the dataset, and exploring additional features to enhance the feedback system.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Semantic Matters: Multimodal Features for Affective Analysis
Authors:
Tobias Hallmen,
Robin-Nico Kampa,
Fabian Deuser,
Norbert Oswald,
Elisabeth André
Abstract:
In this study, we present our methodology for two tasks: the Emotional Mimicry Intensity (EMI) Estimation Challenge and the Behavioural Ambivalence/Hesitancy (BAH) Recognition Challenge, both conducted as part of the 8th Workshop and Competition on Affective & Behavior Analysis in-the-wild. We utilize a Wav2Vec 2.0 model pre-trained on a large podcast dataset to extract various audio features, cap…
▽ More
In this study, we present our methodology for two tasks: the Emotional Mimicry Intensity (EMI) Estimation Challenge and the Behavioural Ambivalence/Hesitancy (BAH) Recognition Challenge, both conducted as part of the 8th Workshop and Competition on Affective & Behavior Analysis in-the-wild. We utilize a Wav2Vec 2.0 model pre-trained on a large podcast dataset to extract various audio features, capturing both linguistic and paralinguistic information. Our approach incorporates a valence-arousal-dominance (VAD) module derived from Wav2Vec 2.0, a BERT text encoder, and a vision transformer (ViT) with predictions subsequently processed through a long short-term memory (LSTM) architecture or a convolution-like method for temporal modeling. We integrate the textual and visual modality into our analysis, recognizing that semantic content provides valuable contextual cues and underscoring that the meaning of speech often conveys more critical insights than its acoustic counterpart alone. Fusing in the vision modality helps in some cases to interpret the textual modality more precisely. This combined approach results in significant performance improvements, achieving in EMI $ρ_{\text{TEST}} = 0.706$ and in BAH $F1_{\text{TEST}} = 0.702$, securing first place in the EMI challenge and second place in the BAH challenge.
△ Less
Submitted 18 April, 2025; v1 submitted 16 March, 2025;
originally announced April 2025.
-
CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding
Authors:
Johannes Kirmayr,
Lukas Stappen,
Phillip Schneider,
Florian Matthes,
Elisabeth André
Abstract:
In today's assistant landscape, personalisation enhances interactions, fosters long-term relationships, and deepens engagement. However, many systems struggle with retaining user preferences, leading to repetitive user requests and disengagement. Furthermore, the unregulated and opaque extraction of user preferences in industry applications raises significant concerns about privacy and trust, espe…
▽ More
In today's assistant landscape, personalisation enhances interactions, fosters long-term relationships, and deepens engagement. However, many systems struggle with retaining user preferences, leading to repetitive user requests and disengagement. Furthermore, the unregulated and opaque extraction of user preferences in industry applications raises significant concerns about privacy and trust, especially in regions with stringent regulations like Europe. In response to these challenges, we propose a long-term memory system for voice assistants, structured around predefined categories. This approach leverages Large Language Models to efficiently extract, store, and retrieve preferences within these categories, ensuring both personalisation and transparency. We also introduce a synthetic multi-turn, multi-session conversation dataset (CarMem), grounded in real industry data, tailored to an in-car voice assistant setting. Benchmarked on the dataset, our system achieves an F1-score of .78 to .95 in preference extraction, depending on category granularity. Our maintenance strategy reduces redundant preferences by 95% and contradictory ones by 92%, while the accuracy of optimal retrieval is at .87. Collectively, the results demonstrate the system's suitability for industrial applications.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Tuning Trains Speed in Railway Scheduling
Authors:
Étienne André
Abstract:
Railway scheduling consists in ensuring that a set of trains evolve in a shared rail network without collisions, while meeting schedule constraints. This problem is notoriously difficult, even more in the case of uncertain or even unknown train speeds. We propose here a modeling and verification approach for railway scheduling in the presence of uncertain speeds, encoded here as uncertain segment…
▽ More
Railway scheduling consists in ensuring that a set of trains evolve in a shared rail network without collisions, while meeting schedule constraints. This problem is notoriously difficult, even more in the case of uncertain or even unknown train speeds. We propose here a modeling and verification approach for railway scheduling in the presence of uncertain speeds, encoded here as uncertain segment durations. We formalize the system and propose a formal translation to parametric timed automata. As a proof of concept, we apply our approach to benchmarks, for which we synthesize using IMITATOR suitable valuations for the segment durations.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Estimating the persistent homology of $\mathbb{R}^n$-valued functions using function-geometric multifiltrations
Authors:
Ethan André,
Jingyi Li,
Steve Oudot
Abstract:
Given an unknown $\mathbb{R}^n$-valued function $f$ on a metric space $X$, can we approximate the persistent homology of $f$ from a finite sampling of $X$ with known pairwise distances and function values? This question has been answered in the case $n=1$, assuming $f$ is Lipschitz continuous and $X$ is a sufficiently regular geodesic metric space, and using filtered geometric complexes with fixed…
▽ More
Given an unknown $\mathbb{R}^n$-valued function $f$ on a metric space $X$, can we approximate the persistent homology of $f$ from a finite sampling of $X$ with known pairwise distances and function values? This question has been answered in the case $n=1$, assuming $f$ is Lipschitz continuous and $X$ is a sufficiently regular geodesic metric space, and using filtered geometric complexes with fixed scale parameter for the approximation. In this paper we answer the question for arbitrary $n$, under similar assumptions and using function-geometric multifiltrations. Our analysis offers a different view on these multifiltrations by focusing on their approximation properties rather than on their stability properties. We also leverage the multiparameter setting to provide insight into the influence of the scale parameter, whose choice is central to this type of approach.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
A Pre-Trained Graph-Based Model for Adaptive Sequencing of Educational Documents
Authors:
Jean Vassoyan,
Anan Schütt,
Jill-Jênn Vie,
Arun-Balajiee Lekshmi-Narayanan,
Elisabeth André,
Nicolas Vayatis
Abstract:
Massive Open Online Courses (MOOCs) have greatly contributed to making education more accessible. However, many MOOCs maintain a rigid, one-size-fits-all structure that fails to address the diverse needs and backgrounds of individual learners. Learning path personalization aims to address this limitation, by tailoring sequences of educational content to optimize individual student learning outcome…
▽ More
Massive Open Online Courses (MOOCs) have greatly contributed to making education more accessible. However, many MOOCs maintain a rigid, one-size-fits-all structure that fails to address the diverse needs and backgrounds of individual learners. Learning path personalization aims to address this limitation, by tailoring sequences of educational content to optimize individual student learning outcomes. Existing approaches, however, often require either massive student interaction data or extensive expert annotation, limiting their broad application. In this study, we introduce a novel data-efficient framework for learning path personalization that operates without expert annotation. Our method employs a flexible recommender system pre-trained with reinforcement learning on a dataset of raw course materials. Through experiments on semi-synthetic data, we show that this pre-training stage substantially improves data-efficiency in a range of adaptive learning scenarios featuring new educational materials. This opens up new perspectives for the design of foundation models for adaptive learning.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
Exploring the Impact of Non-Verbal Virtual Agent Behavior on User Engagement in Argumentative Dialogues
Authors:
Annalena Bea Aicher,
Yuki Matsuda,
Keichii Yasumoto,
Wolfgang Minker,
Elisabeth André,
Stefan Ultes
Abstract:
Engaging in discussions that involve diverse perspectives and exchanging arguments on a controversial issue is a natural way for humans to form opinions. In this process, the way arguments are presented plays a crucial role in determining how engaged users are, whether the interaction takes place solely among humans or within human-agent teams. This is of great importance as user engagement plays…
▽ More
Engaging in discussions that involve diverse perspectives and exchanging arguments on a controversial issue is a natural way for humans to form opinions. In this process, the way arguments are presented plays a crucial role in determining how engaged users are, whether the interaction takes place solely among humans or within human-agent teams. This is of great importance as user engagement plays a crucial role in determining the success or failure of cooperative argumentative discussions. One main goal is to maintain the user's motivation to participate in a reflective opinion-building process, even when addressing contradicting viewpoints. This work investigates how non-verbal agent behavior, specifically co-speech gestures, influences the user's engagement and interest during an ongoing argumentative interaction. The results of a laboratory study conducted with 56 participants demonstrate that the agent's co-speech gestures have a substantial impact on user engagement and interest and the overall perception of the system. Therefore, this research offers valuable insights for the design of future cooperative argumentative virtual agents.
△ Less
Submitted 17 November, 2024;
originally announced November 2024.
-
DJ Mix Transcription with Multi-Pass Non-Negative Matrix Factorization
Authors:
Étienne Paul André,
Dominique Fourer,
Diemo Schwarz
Abstract:
DJ mix transcription is a crucial step towards DJ mix reverse engineering, which estimates the set of parameters and audio effects applied to a set of existing tracks to produce a performative DJ mix. We introduce a new approach based on a multi-pass NMF algorithm where the dictionary matrix corresponds to a set of spectrogram slices of the source tracks present in the mix.
The multi-pass strate…
▽ More
DJ mix transcription is a crucial step towards DJ mix reverse engineering, which estimates the set of parameters and audio effects applied to a set of existing tracks to produce a performative DJ mix. We introduce a new approach based on a multi-pass NMF algorithm where the dictionary matrix corresponds to a set of spectrogram slices of the source tracks present in the mix.
The multi-pass strategy is motivated by the high computational cost resulting from the use of a large NMF dictionary. The proposed method uses inter-pass filtering to favor temporal continuity and sparseness and is evaluated on a publicly available dataset.
Our comparative results considering a baseline method based on dynamic time warping (DTW) are promising and pave the way of future NMF-based applications.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
Execution-time opacity problems in one-clock parametric timed automata
Authors:
Étienne André,
Johan Arcile,
Engel Lefaucheux
Abstract:
Parametric timed automata (PTAs) extend the concept of timed automata, by allowing timing delays not only specified by concrete values but also by parameters, allowing the analysis of systems with uncertainty regarding timing behaviors. The full execution-time opacity is defined as the problem in which an attacker must never be able to deduce whether some private location was visited, by only obse…
▽ More
Parametric timed automata (PTAs) extend the concept of timed automata, by allowing timing delays not only specified by concrete values but also by parameters, allowing the analysis of systems with uncertainty regarding timing behaviors. The full execution-time opacity is defined as the problem in which an attacker must never be able to deduce whether some private location was visited, by only observing the execution time. The problem of full ET-opacity emptiness (i.e., the emptiness over the parameter valuations for which full execution-time opacity is satisfied) is known to be undecidable for general PTAs. We therefore focus here on one-clock PTAs with integer-valued parameters over dense time. We show that the full ET-opacity emptiness is undecidable for a sufficiently large number of parameters, but is decidable for a single parameter, and exact synthesis can be effectively achieved. Our proofs rely on a novel construction as well as on variants of Presburger arithmetics. We finally prove an additional decidability result on an existential variant of execution-time opacity.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Multilingual Dyadic Interaction Corpus NoXi+J: Toward Understanding Asian-European Non-verbal Cultural Characteristics and their Influences on Engagement
Authors:
Marius Funk,
Shogo Okada,
Elisabeth André
Abstract:
Non-verbal behavior is a central challenge in understanding the dynamics of a conversation and the affective states between interlocutors arising from the interaction. Although psychological research has demonstrated that non-verbal behaviors vary across cultures, limited computational analysis has been conducted to clarify these differences and assess their impact on engagement recognition. To ga…
▽ More
Non-verbal behavior is a central challenge in understanding the dynamics of a conversation and the affective states between interlocutors arising from the interaction. Although psychological research has demonstrated that non-verbal behaviors vary across cultures, limited computational analysis has been conducted to clarify these differences and assess their impact on engagement recognition. To gain a greater understanding of engagement and non-verbal behaviors among a wide range of cultures and language spheres, in this study we conduct a multilingual computational analysis of non-verbal features and investigate their role in engagement and engagement prediction. To achieve this goal, we first expanded the NoXi dataset, which contains interaction data from participants living in France, Germany, and the United Kingdom, by collecting session data of dyadic conversations in Japanese and Chinese, resulting in the enhanced dataset NoXi+J. Next, we extracted multimodal non-verbal features, including speech acoustics, facial expressions, backchanneling and gestures, via various pattern recognition techniques and algorithms. Then, we conducted a statistical analysis of listening behaviors and backchannel patterns to identify culturally dependent and independent features in each language and common features among multiple languages. These features were also correlated with the engagement shown by the interlocutors. Finally, we analyzed the influence of cultural differences in the input features of LSTM models trained to predict engagement for five language datasets. A SHAP analysis combined with transfer learning confirmed a considerable correlation between the importance of input features for a language set and the significant cultural characteristics analyzed.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
MITHOS: Interactive Mixed Reality Training to Support Professional Socio-Emotional Interactions at Schools
Authors:
Lara Chehayeb,
Chirag Bhuvaneshwara,
Manuel Anglet,
Bernhard Hilpert,
Ann-Kristin Meyer,
Dimitra Tsovaltzi,
Patrick Gebhard,
Antje Biermann,
Sinah Auchtor,
Nils Lauinger,
Julia Knopf,
Andreas Kaiser,
Fabian Kersting,
Gregor Mehlmann,
Florian Lingenfelser,
Elisabeth André
Abstract:
Teachers in challenging conflict situations often experience shame and self-blame, which relate to the feeling of incompetence but may externalise as anger. Sensing mixed signals fails the contingency rule for developing affect regulation and may result in confusion for students about their own emotions and hinder their emotion regulation. Therefore, being able to constructively regulate emotions…
▽ More
Teachers in challenging conflict situations often experience shame and self-blame, which relate to the feeling of incompetence but may externalise as anger. Sensing mixed signals fails the contingency rule for developing affect regulation and may result in confusion for students about their own emotions and hinder their emotion regulation. Therefore, being able to constructively regulate emotions not only benefits individual experience of emotions but also fosters effective interpersonal emotion regulation and influences how a situation is managed. MITHOS is a system aimed at training teachers' conflict resolution skills through realistic situative learning opportunities during classroom conflicts. In four stages, MITHOS supports teachers' socio-emotional self-awareness, perspective-taking and positive regard. It provides: a) a safe virtual environment to train free social interaction and receive natural social feedback from reciprocal student-agent reactions, b) spatial situational perspective taking through an avatar, c) individual virtual reflection guidance on emotional experiences through co-regulation processes, and d) expert feedback on professional behavioural strategies. This chapter presents the four stages and their implementation in a semi-automatic Wizard-of-Oz (WoZ) System. The WoZ system affords collecting data that are used for developing the fully automated hybrid (machine learning and model-based) system, and to validate the underlying psychological and conflict resolution models. We present results validating the approach in terms of scenario realism, as well as a systematic testing of the effects of external avatar similarity on antecedents of self-awareness with behavior similarity. The chapter contributes to a common methodology of conducting interdisciplinary research for human-centered and generalisable XR and presents a system designed to support it.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Execution-time opacity control for timed automata
Authors:
Étienne André,
Marie Duflot,
Laetitia Laversa,
Engel Lefaucheux
Abstract:
Timing leaks in timed automata (TA) can occur whenever an attacker is able to deduce a secret by observing some timed behavior. In execution-time opacity, the attacker aims at deducing whether a private location was visited, by observing only the execution time. It can be decided whether a TA is opaque in this setting. In this work, we tackle control, and show that we are able to decide whether a…
▽ More
Timing leaks in timed automata (TA) can occur whenever an attacker is able to deduce a secret by observing some timed behavior. In execution-time opacity, the attacker aims at deducing whether a private location was visited, by observing only the execution time. It can be decided whether a TA is opaque in this setting. In this work, we tackle control, and show that we are able to decide whether a TA can be controlled at runtime to ensure opacity. Our method is constructive, in the sense that we can exhibit such a controller. We also address the case when the attacker cannot have an infinite precision in its observations.
△ Less
Submitted 5 December, 2024; v1 submitted 16 September, 2024;
originally announced September 2024.
-
MultiMediate'24: Multi-Domain Engagement Estimation
Authors:
Philipp Müller,
Michal Balazia,
Tobias Baur,
Michael Dietz,
Alexander Heimerl,
Anna Penzkofer,
Dominik Schiller,
François Brémond,
Jan Alexandersson,
Elisabeth André,
Andreas Bulling
Abstract:
Estimating the momentary level of participant's engagement is an important prerequisite for assistive systems that support human interactions. Previous work has addressed this task in within-domain evaluation scenarios, i.e. training and testing on the same dataset. This is in contrast to real-life scenarios where domain shifts between training and testing data frequently occur. With MultiMediate'…
▽ More
Estimating the momentary level of participant's engagement is an important prerequisite for assistive systems that support human interactions. Previous work has addressed this task in within-domain evaluation scenarios, i.e. training and testing on the same dataset. This is in contrast to real-life scenarios where domain shifts between training and testing data frequently occur. With MultiMediate'24, we present the first challenge addressing multi-domain engagement estimation. As training data, we utilise the NOXI database of dyadic novice-expert interactions. In addition to within-domain test data, we add two new test domains. First, we introduce recordings following the NOXI protocol but covering languages that are not present in the NOXI training data. Second, we collected novel engagement annotations on the MPIIGroupInteraction dataset which consists of group discussions between three to four people. In this way, MultiMediate'24 evaluates the ability of approaches to generalise across factors such as language and cultural background, group size, task, and screen-mediated vs. face-to-face interaction. This paper describes the MultiMediate'24 challenge and presents baseline results. In addition, we discuss selected challenge solutions.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
The Bright Side of Timed Opacity
Authors:
Étienne André,
Sarah Dépernet,
Engel Lefaucheux
Abstract:
In 2009, Franck Cassez showed that the timed opacity problem, where an attacker can observe some actions with their timestamps and attempts to deduce information, is undecidable for timed automata (TAs). Moreover, he showed that the undecidability holds even for subclasses such as event-recording automata. In this article, we consider the same definition of opacity for several other subclasses of…
▽ More
In 2009, Franck Cassez showed that the timed opacity problem, where an attacker can observe some actions with their timestamps and attempts to deduce information, is undecidable for timed automata (TAs). Moreover, he showed that the undecidability holds even for subclasses such as event-recording automata. In this article, we consider the same definition of opacity for several other subclasses of TAs: with restrictions on the number of clocks, of actions, on the nature of time, or on a new subclass called observable event-recording automata. We show that opacity can mostly be retrieved, except for one-action TAs and for one-clock TAs with $ε$-transitions, for which undecidability remains. We then exhibit a new decidable subclass in which the number of observations made by the attacker is limited.
△ Less
Submitted 27 September, 2024; v1 submitted 22 August, 2024;
originally announced August 2024.
-
VoiceX: A Text-To-Speech Framework for Custom Voices
Authors:
Silvan Mertes,
Daksitha Withanage Don,
Otto Grothe,
Johanna Kuch,
Ruben Schlagowski,
Elisabeth André
Abstract:
Modern TTS systems are capable of creating highly realistic and natural-sounding speech. Despite these developments, the process of customizing TTS voices remains a complex task, mostly requiring the expertise of specialists within the field. One reason for this is the utilization of deep learning models, which are characterized by their expansive, non-interpretable parameter spaces, restricting t…
▽ More
Modern TTS systems are capable of creating highly realistic and natural-sounding speech. Despite these developments, the process of customizing TTS voices remains a complex task, mostly requiring the expertise of specialists within the field. One reason for this is the utilization of deep learning models, which are characterized by their expansive, non-interpretable parameter spaces, restricting the feasibility of manual customization. In this paper, we present a novel human-in-the-loop paradigm based on an evolutionary algorithm for directly interacting with the parameter space of a neural TTS model. We integrated our approach into a user-friendly graphical user interface that allows users to efficiently create original voices. Those voices can then be used with the backbone TTS model, for which we provide a Python API. Further, we present the results of a user study exploring the capabilities of VoiceX. We show that VoiceX is an appropriate tool for creating individual, custom voices.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Parameterized Verification of Timed Networks with Clock Invariants
Authors:
Étienne André,
Swen Jacobs,
Shyam Lal Karra,
Ocan Sankur
Abstract:
We consider parameterized verification problems for networks of timed automata (TAs) that communicate via disjunctive guards or lossy broadcast. To this end, we first consider disjunctive timed networks (DTNs), i.e., networks of TAs that communicate via location guards that enable a transition only if there is another process in a certain location. We solve for the first time the general case with…
▽ More
We consider parameterized verification problems for networks of timed automata (TAs) that communicate via disjunctive guards or lossy broadcast. To this end, we first consider disjunctive timed networks (DTNs), i.e., networks of TAs that communicate via location guards that enable a transition only if there is another process in a certain location. We solve for the first time the general case with clock invariants, and establish the decidability of the parameterized verification problem for local trace properties and for reachability of global configurations; Moreover, we prove that, surprisingly and unlike in other settings, this model is equivalent to lossy broadcast networks.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Recognizing Emotion Regulation Strategies from Human Behavior with Large Language Models
Authors:
Philipp Müller,
Alexander Heimerl,
Sayed Muddashir Hossain,
Lea Siegel,
Jan Alexandersson,
Patrick Gebhard,
Elisabeth André,
Tanja Schneeberger
Abstract:
Human emotions are often not expressed directly, but regulated according to internal processes and social display rules. For affective computing systems, an understanding of how users regulate their emotions can be highly useful, for example to provide feedback in job interview training, or in psychotherapeutic scenarios. However, at present no method to automatically classify different emotion re…
▽ More
Human emotions are often not expressed directly, but regulated according to internal processes and social display rules. For affective computing systems, an understanding of how users regulate their emotions can be highly useful, for example to provide feedback in job interview training, or in psychotherapeutic scenarios. However, at present no method to automatically classify different emotion regulation strategies in a cross-user scenario exists. At the same time, recent studies showed that instruction-tuned Large Language Models (LLMs) can reach impressive performance across a variety of affect recognition tasks such as categorical emotion recognition or sentiment analysis. While these results are promising, it remains unclear to what extent the representational power of LLMs can be utilized in the more subtle task of classifying users' internal emotion regulation strategy. To close this gap, we make use of the recently introduced \textsc{Deep} corpus for modeling the social display of the emotion shame, where each point in time is annotated with one of seven different emotion regulation classes. We fine-tune Llama2-7B as well as the recently introduced Gemma model using Low-rank Optimization on prompts generated from different sources of information on the \textsc{Deep} corpus. These include verbal and nonverbal behavior, person factors, as well as the results of an in-depth interview after the interaction. Our results show, that a fine-tuned Llama2-7B LLM is able to classify the utilized emotion regulation strategy with high accuracy (0.84) without needing access to data from post-interaction interviews. This represents a significant improvement over previous approaches based on Bayesian Networks and highlights the importance of modeling verbal behavior in emotion regulation.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Hyper parametric timed CTL
Authors:
Masaki Waga,
Étienne André
Abstract:
Hyperproperties enable simultaneous reasoning about multiple execution traces of a system and are useful to reason about non-interference, opacity, robustness, fairness, observational determinism, etc. We introduce hyper parametric timed computation tree logic (HyperPTCTL), extending hyperlogics with timing reasoning and, notably, parameters to express unknown values. We mainly consider its nest-f…
▽ More
Hyperproperties enable simultaneous reasoning about multiple execution traces of a system and are useful to reason about non-interference, opacity, robustness, fairness, observational determinism, etc. We introduce hyper parametric timed computation tree logic (HyperPTCTL), extending hyperlogics with timing reasoning and, notably, parameters to express unknown values. We mainly consider its nest-free fragment, where temporal operators cannot be nested. However, we allow extensions that enable counting actions and comparing the duration since the most recent occurrence of specific actions. We show that our nest-free fragment with this extension is sufficiently expressive to encode properties, e.g., opacity, (un)fairness, or robust observational (non-)determinism. We propose semi-algorithms for model checking and synthesis of parametric timed automata (an extension of timed automata with timing parameters) against this nest-free fragment with the extension via reduction to PTCTL model checking and synthesis. While the general model checking (and thus synthesis) problem is undecidable, we show that a large part of our extended (yet nest-free) fragment is decidable, provided the parameters only appear in the property, not in the model. We also exhibit additional decidable fragments where parameters within the model are allowed. We implemented our semi-algorithms on top of the IMITATOR model checker, and performed experiments. Our implementation supports most of the nest-free fragments (beyond the decidable classes). The experimental results highlight our method's practical relevance.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
Formalizing UML State Machines for Automated Verification -- A Survey
Authors:
Étienne André,
Shuang Liu,
Yang Liu,
Christine Choppy,
Jun Sun,
Jin Song Dong
Abstract:
The Unified Modeling Language (UML) is a standard for modeling dynamic systems. UML behavioral state machines are used for modeling the dynamic behavior of object-oriented designs. The UML specification, maintained by the Object Management Group (OMG), is documented in natural language (in contrast to formal language). The inherent ambiguity of natural languages may introduce inconsistencies in th…
▽ More
The Unified Modeling Language (UML) is a standard for modeling dynamic systems. UML behavioral state machines are used for modeling the dynamic behavior of object-oriented designs. The UML specification, maintained by the Object Management Group (OMG), is documented in natural language (in contrast to formal language). The inherent ambiguity of natural languages may introduce inconsistencies in the resulting state machine model. Formalizing UML state machine specification aims at solving the ambiguity problem and at providing a uniform view to software designers and developers. Such a formalization also aims at providing a foundation for automatic verification of UML state machine models, which can help to find software design vulnerabilities at an early stage and reduce the development cost. We provide here a comprehensive survey of existing work from 1997 to 2021 related to formalizing UML state machine semantics for the purpose of conducting model checking at the design stage.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
MoULDyS: Monitoring of Autonomous Systems in the Presence of Uncertainties
Authors:
Bineet Ghosh,
Étienne André
Abstract:
We introduce MoULDyS, that implements efficient offline and online monitoring algorithms of black-box cyber-physical systems w.r.t. safety properties. MoULDyS takes as input an uncertain log (with noisy and missing samples), as well as a bounding model in the form of an uncertain linear system; this latter model plays the role of an over-approximation so as to reduce the number of false alarms. Mo…
▽ More
We introduce MoULDyS, that implements efficient offline and online monitoring algorithms of black-box cyber-physical systems w.r.t. safety properties. MoULDyS takes as input an uncertain log (with noisy and missing samples), as well as a bounding model in the form of an uncertain linear system; this latter model plays the role of an over-approximation so as to reduce the number of false alarms. MoULDyS is Python-based and available under the GNU General Public License v3.0 (gpl-3.0). We further provide easy-to-use scripts to recreate the results of two case studies introduced in an earlier work.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
DISCOVER: A Data-driven Interactive System for Comprehensive Observation, Visualization, and ExploRation of Human Behaviour
Authors:
Dominik Schiller,
Tobias Hallmen,
Daksitha Withanage Don,
Elisabeth André,
Tobias Baur
Abstract:
Understanding human behavior is a fundamental goal of social sciences, yet its analysis presents significant challenges. Conventional methodologies employed for the study of behavior, characterized by labor-intensive data collection processes and intricate analyses, frequently hinder comprehensive exploration due to their time and resource demands. In response to these challenges, computational mo…
▽ More
Understanding human behavior is a fundamental goal of social sciences, yet its analysis presents significant challenges. Conventional methodologies employed for the study of behavior, characterized by labor-intensive data collection processes and intricate analyses, frequently hinder comprehensive exploration due to their time and resource demands. In response to these challenges, computational models have proven to be promising tools that help researchers analyze large amounts of data by automatically identifying important behavioral indicators, such as social signals. However, the widespread adoption of such state-of-the-art computational models is impeded by their inherent complexity and the substantial computational resources necessary to run them, thereby constraining accessibility for researchers without technical expertise and adequate equipment. To address these barriers, we introduce DISCOVER -- a modular and flexible, yet user-friendly software framework specifically developed to streamline computational-driven data exploration for human behavior analysis. Our primary objective is to democratize access to advanced computational methodologies, thereby enabling researchers across disciplines to engage in detailed behavioral analysis without the need for extensive technical proficiency. In this paper, we demonstrate the capabilities of DISCOVER using four exemplary data exploration workflows that build on each other: Interactive Semantic Content Exploration, Visual Inspection, Aided Annotation, and Multimodal Scene Search. By illustrating these workflows, we aim to emphasize the versatility and accessibility of DISCOVER as a comprehensive framework and propose a set of blueprints that can serve as a general starting point for exploratory data analysis.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Tidal Disruption of Stars by Supermassive Black Holes and Naked Singularities with Scalar Hair
Authors:
Eduardo Andre,
Alexander Tsirulev
Abstract:
In this paper we study tidal forces near strongly gravitating objects at the centers of galaxies. In our approach, dark matter surrounding the centers of galaxies is modeled by a nonlinear scalar field. We focus on static, asymptotically flat, spherically symmetric black holes and naked singularities supported by a real self-gravitating scalar field minimally coupled to gravity. We consider the in…
▽ More
In this paper we study tidal forces near strongly gravitating objects at the centers of galaxies. In our approach, dark matter surrounding the centers of galaxies is modeled by a nonlinear scalar field. We focus on static, asymptotically flat, spherically symmetric black holes and naked singularities supported by a real self-gravitating scalar field minimally coupled to gravity. We consider the influence of dark matter on the tidal forces caused by the central objects and some features of the motion of matter near these configurations. It turns out, first, that the event horizon radius of a scalar field black hole can be arbitrarily small as well as the radius of the innermost stable circular orbit, and tidal forces near this orbit can be very large. Second, we show that if strongly gravitating objects in the centers of galaxies are naked singularities, then bright flares (tidal disruption events) in the nuclei of galaxies can be explained in different way. In the vicinity of a scalar field naked singularity, baryonic matter is concentrated in the gravitational potential well forming a "gray shell" consisting of cold compressed matter. In this case, the flares are caused by the collisions of stars with this shell.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Socially Interactive Agents for Robotic Neurorehabilitation Training: Conceptualization and Proof-of-concept Study
Authors:
Rhythm Arora,
Pooja Prajod,
Matteo Lavit Nicora,
Daniele Panzeri,
Giovanni Tauro,
Rocco Vertechy,
Matteo Malosio,
Elisabeth André,
Patrick Gebhard
Abstract:
Individuals with diverse motor abilities often benefit from intensive and specialized rehabilitation therapies aimed at enhancing their functional recovery. Nevertheless, the challenge lies in the restricted availability of neurorehabilitation professionals, hindering the effective delivery of the necessary level of care. Robotic devices hold great potential in reducing the dependence on medical p…
▽ More
Individuals with diverse motor abilities often benefit from intensive and specialized rehabilitation therapies aimed at enhancing their functional recovery. Nevertheless, the challenge lies in the restricted availability of neurorehabilitation professionals, hindering the effective delivery of the necessary level of care. Robotic devices hold great potential in reducing the dependence on medical personnel during therapy but, at the same time, they generally lack the crucial human interaction and motivation that traditional in-person sessions provide. To bridge this gap, we introduce an AI-based system aimed at delivering personalized, out-of-hospital assistance during neurorehabilitation training. This system includes a rehabilitation training device, affective signal classification models, training exercises, and a socially interactive agent as the user interface. With the assistance of a professional, the envisioned system is designed to be tailored to accommodate the unique rehabilitation requirements of an individual patient. Conceptually, after a preliminary setup and instruction phase, the patient is equipped to continue their rehabilitation regimen autonomously in the comfort of their home, facilitated by a socially interactive agent functioning as a virtual coaching assistant. Our approach involves the integration of an interactive socially-aware virtual agent into a neurorehabilitation robotic framework, with the primary objective of recreating the social aspects inherent to in-person rehabilitation sessions. We also conducted a feasibility study to test the framework with healthy patients. The results of our preliminary investigation indicate that participants demonstrated a propensity to adapt to the system. Notably, the presence of the interactive agent during the proposed exercises did not act as a source of distraction; instead, it positively impacted users' engagement.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Faces of Experimental Pain: Transferability of Deep Learned Heat Pain Features to Electrical Pain
Authors:
Pooja Prajod,
Dominik Schiller,
Daksitha Withanage Don,
Elisabeth André
Abstract:
The limited size of pain datasets are a challenge in developing robust deep learning models for pain recognition. Transfer learning approaches are often employed in these scenarios. In this study, we investigate whether deep learned feature representation for one type of experimentally induced pain can be transferred to another. Participating in the AI4Pain challenge, our goal is to classify three…
▽ More
The limited size of pain datasets are a challenge in developing robust deep learning models for pain recognition. Transfer learning approaches are often employed in these scenarios. In this study, we investigate whether deep learned feature representation for one type of experimentally induced pain can be transferred to another. Participating in the AI4Pain challenge, our goal is to classify three levels of pain (No-Pain, Low-Pain, High-Pain). The challenge dataset contains data collected from 65 participants undergoing varying intensities of electrical pain. We utilize the video recording from the dataset to investigate the transferability of deep learned heat pain model to electrical pain. In our proposed approach, we leverage an existing heat pain convolutional neural network (CNN) - trained on BioVid dataset - as a feature extractor. The images from the challenge dataset are inputted to the pre-trained heat pain CNN to obtain feature vectors. These feature vectors are used to train two machine learning models: a simple feed-forward neural network and a long short-term memory (LSTM) network. Our approach was tested using the dataset's predefined training, validation, and testing splits. Our models outperformed the baseline of the challenge on both the validation and tests sets, highlighting the potential of models trained on other pain datasets for reliable feature extraction.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Stressor Type Matters! -- Exploring Factors Influencing Cross-Dataset Generalizability of Physiological Stress Detection
Authors:
Pooja Prajod,
Bhargavi Mahesh,
Elisabeth André
Abstract:
Automatic stress detection using heart rate variability (HRV) features has gained significant traction as it utilizes unobtrusive wearable sensors measuring signals like electrocardiogram (ECG) or blood volume pulse (BVP). However, detecting stress through such physiological signals presents a considerable challenge owing to the variations in recorded signals influenced by factors, such as perceiv…
▽ More
Automatic stress detection using heart rate variability (HRV) features has gained significant traction as it utilizes unobtrusive wearable sensors measuring signals like electrocardiogram (ECG) or blood volume pulse (BVP). However, detecting stress through such physiological signals presents a considerable challenge owing to the variations in recorded signals influenced by factors, such as perceived stress intensity and measurement devices. Consequently, stress detection models developed on one dataset may perform poorly on unseen data collected under different conditions. To address this challenge, this study explores the generalizability of machine learning models trained on HRV features for binary stress detection. Our goal extends beyond evaluating generalization performance; we aim to identify the characteristics of datasets that have the most significant influence on generalizability. We leverage four publicly available stress datasets (WESAD, SWELL-KW, ForDigitStress, VerBIO) that vary in at least one of the characteristics such as stress elicitation techniques, stress intensity, and sensor devices. Employing a cross-dataset evaluation approach, we explore which of these characteristics strongly influence model generalizability. Our findings reveal a crucial factor affecting model generalizability: stressor type. Models achieved good performance across datasets when the type of stressor (e.g., social stress in our case) remains consistent. Factors like stress intensity or brand of the measurement device had minimal impact on cross-dataset performance. Based on our findings, we recommend matching the stressor type when deploying HRV-based stress models in new environments. To the best of our knowledge, this is the first study to systematically investigate factors influencing the cross-dataset applicability of HRV-based stress models.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers
Authors:
Silvan Mertes,
Tobias Huber,
Christina Karle,
Katharina Weitz,
Ruben Schlagowski,
Cristina Conati,
Elisabeth André
Abstract:
In this paper, we demonstrate the feasibility of alterfactual explanations for black box image classifiers. Traditional explanation mechanisms from the field of Counterfactual Thinking are a widely-used paradigm for Explainable Artificial Intelligence (XAI), as they follow a natural way of reasoning that humans are familiar with. However, most common approaches from this field are based on communi…
▽ More
In this paper, we demonstrate the feasibility of alterfactual explanations for black box image classifiers. Traditional explanation mechanisms from the field of Counterfactual Thinking are a widely-used paradigm for Explainable Artificial Intelligence (XAI), as they follow a natural way of reasoning that humans are familiar with. However, most common approaches from this field are based on communicating information about features or characteristics that are especially important for an AI's decision. However, to fully understand a decision, not only knowledge about relevant features is needed, but the awareness of irrelevant information also highly contributes to the creation of a user's mental model of an AI system. To this end, a novel approach for explaining AI systems called alterfactual explanations was recently proposed on a conceptual level. It is based on showing an alternative reality where irrelevant features of an AI's input are altered. By doing so, the user directly sees which input data characteristics can change arbitrarily without influencing the AI's decision. In this paper, we show for the first time that it is possible to apply this idea to black box models based on neural networks. To this end, we present a GAN-based approach to generate these alterfactual explanations for binary image classifiers. Further, we present a user study that gives interesting insights on how alterfactual explanations can complement counterfactual explanations.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Temporal Logic Formalisation of ISO 34502 Critical Scenarios: Modular Construction with the RSS Safety Distance
Authors:
Jesse Reimann,
Nico Mansion,
James Haydon,
Benjamin Bray,
Agnishom Chattopadhyay,
Sota Sato,
Masaki Waga,
Étienne André,
Ichiro Hasuo,
Naoki Ueda,
Yosuke Yokoyama
Abstract:
As the development of autonomous vehicles progresses, efficient safety assurance methods become increasingly necessary. Safety assurance methods such as monitoring and scenario-based testing call for formalisation of driving scenarios. In this paper, we develop a temporal-logic formalisation of an important class of critical scenarios in the ISO standard 34502. We use signal temporal logic (STL) a…
▽ More
As the development of autonomous vehicles progresses, efficient safety assurance methods become increasingly necessary. Safety assurance methods such as monitoring and scenario-based testing call for formalisation of driving scenarios. In this paper, we develop a temporal-logic formalisation of an important class of critical scenarios in the ISO standard 34502. We use signal temporal logic (STL) as a logical formalism. Our formalisation has two main features: 1) modular composition of logical formulas for systematic and comprehensive formalisation (following the compositional methodology of ISO 34502); 2) use of the RSS distance for defining danger. We find our formalisation comes with few parameters to tune thanks to the RSS distance. We experimentally evaluated our formalisation; using its results, we discuss the validity of our formalisation and its stability with respect to the choice of some parameter values.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Unimodal Multi-Task Fusion for Emotional Mimicry Intensity Prediction
Authors:
Tobias Hallmen,
Fabian Deuser,
Norbert Oswald,
Elisabeth André
Abstract:
In this research, we introduce a novel methodology for assessing Emotional Mimicry Intensity (EMI) as part of the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild. Our methodology utilises the Wav2Vec 2.0 architecture, which has been pre-trained on an extensive podcast dataset, to capture a wide array of audio features that include both linguistic and paralinguistic componen…
▽ More
In this research, we introduce a novel methodology for assessing Emotional Mimicry Intensity (EMI) as part of the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild. Our methodology utilises the Wav2Vec 2.0 architecture, which has been pre-trained on an extensive podcast dataset, to capture a wide array of audio features that include both linguistic and paralinguistic components. We refine our feature extraction process by employing a fusion technique that combines individual features with a global mean vector, thereby embedding a broader contextual understanding into our analysis. A key aspect of our approach is the multi-task fusion strategy that not only leverages these features but also incorporates a pre-trained Valence-Arousal-Dominance (VAD) model. This integration is designed to refine emotion intensity prediction by concurrently processing multiple emotional dimensions, thereby embedding a richer contextual understanding into our framework. For the temporal analysis of audio data, our feature fusion process utilises a Long Short-Term Memory (LSTM) network. This approach, which relies solely on the provided audio data, shows marked advancements over the existing baseline, offering a more comprehensive understanding of emotional mimicry in naturalistic settings, achieving the second place in the EMI challenge.
△ Less
Submitted 16 June, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Expiring opacity problems in parametric timed automata
Authors:
Étienne André,
Engel Lefaucheux,
Dylan Marinho
Abstract:
Information leakage can have dramatic consequences on the security of real-time systems. Timing leaks occur when an attacker is able to infer private behavior depending on timing information. In this work, we propose a definition of expiring timed opacity w.r.t. execution time, where a system is opaque whenever the attacker is unable to deduce the reachability of some private state solely based on…
▽ More
Information leakage can have dramatic consequences on the security of real-time systems. Timing leaks occur when an attacker is able to infer private behavior depending on timing information. In this work, we propose a definition of expiring timed opacity w.r.t. execution time, where a system is opaque whenever the attacker is unable to deduce the reachability of some private state solely based on the execution time; in addition, the secrecy is violated only when the private state was entered "recently", i.e., within a given time bound (or expiration date) prior to system completion. This has an interesting parallel with concrete applications, notably cache deducibility: it may be useless for the attacker to know the cache content too late after its observance. We study here expiring timed opacity problems in timed automata. We consider the set of time bounds (or expiration dates) for which a system is opaque and show when they can be effectively computed for timed automata. We then study the decidability of several parameterized problems, when not only the bounds, but also some internal timing constants become timing parameters of unknown constant values.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
The AffectToolbox: Affect Analysis for Everyone
Authors:
Silvan Mertes,
Dominik Schiller,
Michael Dietz,
Elisabeth André,
Florian Lingenfelser
Abstract:
In the field of affective computing, where research continually advances at a rapid pace, the demand for user-friendly tools has become increasingly apparent. In this paper, we present the AffectToolbox, a novel software system that aims to support researchers in developing affect-sensitive studies and prototypes. The proposed system addresses the challenges posed by existing frameworks, which oft…
▽ More
In the field of affective computing, where research continually advances at a rapid pace, the demand for user-friendly tools has become increasingly apparent. In this paper, we present the AffectToolbox, a novel software system that aims to support researchers in developing affect-sensitive studies and prototypes. The proposed system addresses the challenges posed by existing frameworks, which often require profound programming knowledge and cater primarily to power-users or skilled developers. Aiming to facilitate ease of use, the AffectToolbox requires no programming knowledge and offers its functionality to reliably analyze the affective state of users through an accessible graphical user interface. The architecture encompasses a variety of models for emotion recognition on multiple affective channels and modalities, as well as an elaborate fusion system to merge multi-modal assessments into a unified result. The entire system is open-sourced and will be publicly available to ensure easy integration into more complex applications through a well-structured, Python-based code base - therefore marking a substantial contribution toward advancing affective computing research and fostering a more collaborative and inclusive environment within this interdisciplinary field.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents
Authors:
Daksitha Withanage Don,
Philipp Müller,
Fabrizio Nunnari,
Elisabeth André,
Patrick Gebhard
Abstract:
Flexible and natural nonverbal reactions to human behavior remain a challenge for socially interactive agents (SIAs) that are predominantly animated using hand-crafted rules. While recently proposed machine learning based approaches to conversational behavior generation are a promising way to address this challenge, they have not yet been employed in SIAs. The primary reason for this is the lack o…
▽ More
Flexible and natural nonverbal reactions to human behavior remain a challenge for socially interactive agents (SIAs) that are predominantly animated using hand-crafted rules. While recently proposed machine learning based approaches to conversational behavior generation are a promising way to address this challenge, they have not yet been employed in SIAs. The primary reason for this is the lack of a software toolkit integrating such approaches with SIA frameworks that conforms to the challenging real-time requirements of human-agent interaction scenarios. In our work, we for the first time present such a toolkit consisting of three main components: (1) real-time feature extraction capturing multi-modal social cues from the user; (2) behavior generation based on a recent state-of-the-art neural network approach; (3) visualization of the generated behavior supporting both FLAME-based and Apple ARKit-based interactive agents. We comprehensively evaluate the real-time performance of the whole framework and its components. In addition, we introduce pre-trained behavioral generation models derived from psychotherapy sessions for domain-specific listening behaviors. Our software toolkit, pivotal for deploying and assessing SIAs' listening behavior in real-time, is publicly available. Resources, including code, behavioural multi-modal features extracted from therapeutic interactions, are hosted at https://daksitha.github.io/ReNeLib
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Giving Robots a Voice: Human-in-the-Loop Voice Creation and open-ended Labeling
Authors:
Pol van Rijn,
Silvan Mertes,
Kathrin Janowski,
Katharina Weitz,
Nori Jacoby,
Elisabeth André
Abstract:
Speech is a natural interface for humans to interact with robots. Yet, aligning a robot's voice to its appearance is challenging due to the rich vocabulary of both modalities. Previous research has explored a few labels to describe robots and tested them on a limited number of robots and existing voices. Here, we develop a robot-voice creation tool followed by large-scale behavioral human experime…
▽ More
Speech is a natural interface for humans to interact with robots. Yet, aligning a robot's voice to its appearance is challenging due to the rich vocabulary of both modalities. Previous research has explored a few labels to describe robots and tested them on a limited number of robots and existing voices. Here, we develop a robot-voice creation tool followed by large-scale behavioral human experiments (N=2,505). First, participants collectively tune robotic voices to match 175 robot images using an adaptive human-in-the-loop pipeline. Then, participants describe their impression of the robot or their matched voice using another human-in-the-loop paradigm for open-ended labeling. The elicited taxonomy is then used to rate robot attributes and to predict the best voice for an unseen robot. We offer a web interface to aid engineers in customizing robot voices, demonstrating the synergy between cognitive science and machine learning for engineering tools.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Exploring the Dynamics between Cobot's Production Rhythm, Locus of Control and Emotional State in a Collaborative Assembly Scenario
Authors:
Marta Mondellini,
Matteo Lavit Nicora,
Pooja Prajod,
Elisabeth André,
Rocco Vertechy,
Alessandro Antonietti,
Matteo Malosio
Abstract:
In industrial scenarios, there is widespread use of collaborative robots (cobots), and growing interest is directed at evaluating and measuring the impact of some characteristics of the cobot on the human factor. In the present pilot study, the effect that the production rhythm (C1 - Slow, C2 - Fast, C3 - Adapted to the participant's pace) of a cobot has on the Experiential Locus of Control (ELoC)…
▽ More
In industrial scenarios, there is widespread use of collaborative robots (cobots), and growing interest is directed at evaluating and measuring the impact of some characteristics of the cobot on the human factor. In the present pilot study, the effect that the production rhythm (C1 - Slow, C2 - Fast, C3 - Adapted to the participant's pace) of a cobot has on the Experiential Locus of Control (ELoC) and the emotional state of 31 participants has been examined. The operators' performance, the degree of basic internal Locus of Control, and the attitude towards the robots were also considered. No difference was found regarding the emotional state and the ELoC in the three conditions, but considering the other psychological variables, a more complex situation emerges. Overall, results seem to indicate a need to consider the person's psychological characteristics to offer a differentiated and optimal interaction experience.
△ Less
Submitted 27 June, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
REACT 2024: the Second Multiple Appropriate Facial Reaction Generation Challenge
Authors:
Siyang Song,
Micol Spitale,
Cheng Luo,
Cristina Palmero,
German Barquero,
Hengde Zhu,
Sergio Escalera,
Michel Valstar,
Tobias Baur,
Fabien Ringeval,
Elisabeth Andre,
Hatice Gunes
Abstract:
In dyadic interactions, humans communicate their intentions and state of mind using verbal and non-verbal cues, where multiple different facial reactions might be appropriate in response to a specific speaker behaviour. Then, how to develop a machine learning (ML) model that can automatically generate multiple appropriate, diverse, realistic and synchronised human facial reactions from an previous…
▽ More
In dyadic interactions, humans communicate their intentions and state of mind using verbal and non-verbal cues, where multiple different facial reactions might be appropriate in response to a specific speaker behaviour. Then, how to develop a machine learning (ML) model that can automatically generate multiple appropriate, diverse, realistic and synchronised human facial reactions from an previously unseen speaker behaviour is a challenging task. Following the successful organisation of the first REACT challenge (REACT 2023), this edition of the challenge (REACT 2024) employs a subset used by the previous challenge, which contains segmented 30-secs dyadic interaction clips originally recorded as part of the NOXI and RECOLA datasets, encouraging participants to develop and benchmark Machine Learning (ML) models that can generate multiple appropriate facial reactions (including facial image sequences and their attributes) given an input conversational partner's stimulus under various dyadic video conference scenarios. This paper presents: (i) the guidelines of the REACT 2024 challenge; (ii) the dataset utilized in the challenge; and (iii) the performance of the baseline systems on the two proposed sub-challenges: Offline Multiple Appropriate Facial Reaction Generation and Online Multiple Appropriate Facial Reaction Generation, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline_react2024.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Gaze Detection and Analysis for Initiating Joint Activity in Industrial Human-Robot Collaboration
Authors:
Pooja Prajod,
Matteo Lavit Nicora,
Marta Mondellini,
Giovanni Tauro,
Rocco Vertechy,
Matteo Malosio,
Elisabeth André
Abstract:
Collaborative robots (cobots) are widely used in industrial applications, yet extensive research is still needed to enhance human-robot collaborations and operator experience. A potential approach to improve the collaboration experience involves adapting cobot behavior based on natural cues from the operator. Inspired by the literature on human-human interactions, we conducted a wizard-of-oz study…
▽ More
Collaborative robots (cobots) are widely used in industrial applications, yet extensive research is still needed to enhance human-robot collaborations and operator experience. A potential approach to improve the collaboration experience involves adapting cobot behavior based on natural cues from the operator. Inspired by the literature on human-human interactions, we conducted a wizard-of-oz study to examine whether a gaze towards the cobot can serve as a trigger for initiating joint activities in collaborative sessions. In this study, 37 participants engaged in an assembly task while their gaze behavior was analyzed. We employ a gaze-based attention recognition model to identify when the participants look at the cobot. Our results indicate that in most cases (84.88\%), the joint activity is preceded by a gaze towards the cobot. Furthermore, during the entire assembly cycle, the participants tend to look at the cobot around the time of the joint activity. To the best of our knowledge, this is the first study to analyze the natural gaze behavior of participants working on a joint activity with a robot during a collaborative assembly task.
△ Less
Submitted 1 February, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Does Difficulty even Matter? Investigating Difficulty Adjustment and Practice Behavior in an Open-Ended Learning Task
Authors:
Anan Schütt,
Tobias Huber,
Jauwairia Nasir,
Cristina Conati,
Elisabeth André
Abstract:
Difficulty adjustment in practice exercises has been shown to be beneficial for learning. However, previous research has mostly investigated close-ended tasks, which do not offer the students multiple ways to reach a valid solution. Contrary to this, in order to learn in an open-ended learning task, students need to effectively explore the solution space as there are multiple ways to reach a solut…
▽ More
Difficulty adjustment in practice exercises has been shown to be beneficial for learning. However, previous research has mostly investigated close-ended tasks, which do not offer the students multiple ways to reach a valid solution. Contrary to this, in order to learn in an open-ended learning task, students need to effectively explore the solution space as there are multiple ways to reach a solution. For this reason, the effects of difficulty adjustment could be different for open-ended tasks. To investigate this, as our first contribution, we compare different methods of difficulty adjustment in a user study conducted with 86 participants. Furthermore, as the practice behavior of the students is expected to influence how well the students learn, we additionally look at their practice behavior as a post-hoc analysis. Therefore, as a second contribution, we identify different types of practice behavior and how they link to students' learning outcomes and subjective evaluation measures as well as explore the influence the difficulty adjustment methods have on the practice behaviors. Our results suggest the usefulness of taking into account the practice behavior in addition to only using the practice performance to inform adaptive intervention and difficulty adjustment methods.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Configuring Timing Parameters to Ensure Execution-Time Opacity in Timed Automata
Authors:
Étienne André,
Engel Lefaucheux,
Didier Lime,
Dylan Marinho,
Jun Sun
Abstract:
Timing information leakage occurs whenever an attacker successfully deduces confidential internal information by observing some timed information such as events with timestamps. Timed automata are an extension of finite-state automata with a set of clocks evolving linearly and that can be tested or reset, making this formalism able to reason on systems involving concurrency and timing constraints.…
▽ More
Timing information leakage occurs whenever an attacker successfully deduces confidential internal information by observing some timed information such as events with timestamps. Timed automata are an extension of finite-state automata with a set of clocks evolving linearly and that can be tested or reset, making this formalism able to reason on systems involving concurrency and timing constraints. In this paper, we summarize a recent line of works using timed automata as the input formalism, in which we assume that the attacker has access (only) to the system execution time. First, we address the following execution-time opacity problem: given a timed system modeled by a timed automaton, given a secret location and a final location, synthesize the execution times from the initial location to the final location for which one cannot deduce whether the secret location was visited. This means that for any such execution time, the system is opaque: either the final location is not reachable, or it is reachable with that execution time for both a run visiting and a run not visiting the secret location. We also address the full execution-time opacity problem, asking whether the system is opaque for all execution times; we also study a weak counterpart. Second, we add timing parameters, which are a way to configure a system: we identify a subclass of parametric timed automata with some decidability results. In addition, we devise a semi-algorithm for synthesizing timing parameter valuations guaranteeing that the resulting system is opaque. Third, we report on problems when the secret has itself an expiration date, thus defining expiring execution-time opacity problems. We finally show that our method can also apply to program analysis with configurable internal timings.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Dense Integer-Complete Synthesis for Bounded Parametric Timed Automata
Authors:
Étienne André,
Didier Lime,
Olivier H. Roux
Abstract:
Ensuring the correctness of critical real-time systems, involving concurrent behaviors and timing requirements, is crucial. Timed automata extend finite-state automata with clocks, compared in guards and invariants with integer constants. Parametric timed automata (PTAs) extend timed automata with timing parameters. Parameter synthesis aims at computing dense sets of valuations for the timing para…
▽ More
Ensuring the correctness of critical real-time systems, involving concurrent behaviors and timing requirements, is crucial. Timed automata extend finite-state automata with clocks, compared in guards and invariants with integer constants. Parametric timed automata (PTAs) extend timed automata with timing parameters. Parameter synthesis aims at computing dense sets of valuations for the timing parameters, guaranteeing a good behavior. However, in most cases, the emptiness problem for reachability (i.e., the emptiness of the parameter valuations set for which some location is reachable) is undecidable for PTAs and, as a consequence, synthesis procedures do not terminate in general, even for bounded parameters. In this paper, we introduce a parametric extrapolation, that allows us to derive an underapproximation in the form of symbolic sets of valuations containing not only all the integer points ensuring reachability, but also all the (non-necessarily integer) convex combinations of these integer points, for general PTAs with a bounded parameter domain. We also propose two further algorithms synthesizing parameter valuations guaranteeing unavoidability, and preservation of the untimed behavior w.r.t. a reference parameter valuation, respectively. Our algorithms terminate and can output sets of valuations arbitrarily close to the complete result. We demonstrate their applicability and efficiency using the tools Roméo and IMITATOR on several benchmarks.
△ Less
Submitted 27 September, 2024; v1 submitted 13 October, 2023;
originally announced October 2023.
-
Fostering User Engagement in the Critical Reflection of Arguments
Authors:
Klaus Weber,
Annalena Aicher,
Wolfang Minker,
Stefan Ultes,
Elisabeth André
Abstract:
A natural way to resolve different points of view and form opinions is through exchanging arguments and knowledge. Facing the vast amount of available information on the internet, people tend to focus on information consistent with their beliefs. Especially when the issue is controversial, information is often selected that does not challenge one's beliefs. To support a fair and unbiased opinion-b…
▽ More
A natural way to resolve different points of view and form opinions is through exchanging arguments and knowledge. Facing the vast amount of available information on the internet, people tend to focus on information consistent with their beliefs. Especially when the issue is controversial, information is often selected that does not challenge one's beliefs. To support a fair and unbiased opinion-building process, we propose a chatbot system that engages in a deliberative dialogue with a human. In contrast to persuasive systems, the envisioned chatbot aims to provide a diverse and representative overview - embedded in a conversation with the user. To account for a reflective and unbiased exploration of the topic, we enable the system to intervene if the user is too focused on their pre-existing opinion. Therefore we propose a model to estimate the users' reflective engagement (RUE), defined as their critical thinking and open-mindedness. We report on a user study with 58 participants to test our model and the effect of the intervention mechanism, discuss the implications of the results, and present perspectives for future work. The results show a significant effect on both user reflection and total user focus, proving our proposed approach's validity.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
MultiMediate'23: Engagement Estimation and Bodily Behaviour Recognition in Social Interactions
Authors:
Philipp Müller,
Michal Balazia,
Tobias Baur,
Michael Dietz,
Alexander Heimerl,
Dominik Schiller,
Mohammed Guermal,
Dominike Thomas,
François Brémond,
Jan Alexandersson,
Elisabeth André,
Andreas Bulling
Abstract:
Automatic analysis of human behaviour is a fundamental prerequisite for the creation of machines that can effectively interact with- and support humans in social interactions. In MultiMediate'23, we address two key human social behaviour analysis tasks for the first time in a controlled challenge: engagement estimation and bodily behaviour recognition in social interactions. This paper describes t…
▽ More
Automatic analysis of human behaviour is a fundamental prerequisite for the creation of machines that can effectively interact with- and support humans in social interactions. In MultiMediate'23, we address two key human social behaviour analysis tasks for the first time in a controlled challenge: engagement estimation and bodily behaviour recognition in social interactions. This paper describes the MultiMediate'23 challenge and presents novel sets of annotations for both tasks. For engagement estimation we collected novel annotations on the NOvice eXpert Interaction (NOXI) database. For bodily behaviour recognition, we annotated test recordings of the MPIIGroupInteraction corpus with the BBSI annotation scheme. In addition, we present baseline results for both challenge tasks.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
REACT2023: the first Multi-modal Multiple Appropriate Facial Reaction Generation Challenge
Authors:
Siyang Song,
Micol Spitale,
Cheng Luo,
German Barquero,
Cristina Palmero,
Sergio Escalera,
Michel Valstar,
Tobias Baur,
Fabien Ringeval,
Elisabeth Andre,
Hatice Gunes
Abstract:
The Multi-modal Multiple Appropriate Facial Reaction Generation Challenge (REACT2023) is the first competition event focused on evaluating multimedia processing and machine learning techniques for generating human-appropriate facial reactions in various dyadic interaction scenarios, with all participants competing strictly under the same conditions. The goal of the challenge is to provide the firs…
▽ More
The Multi-modal Multiple Appropriate Facial Reaction Generation Challenge (REACT2023) is the first competition event focused on evaluating multimedia processing and machine learning techniques for generating human-appropriate facial reactions in various dyadic interaction scenarios, with all participants competing strictly under the same conditions. The goal of the challenge is to provide the first benchmark test set for multi-modal information processing and to foster collaboration among the audio, visual, and audio-visual affective computing communities, to compare the relative merits of the approaches to automatic appropriate facial reaction generation under different spontaneous dyadic interaction conditions. This paper presents: (i) novelties, contributions and guidelines of the REACT2023 challenge; (ii) the dataset utilized in the challenge; and (iii) the performance of baseline systems on the two proposed sub-challenges: Offline Multiple Appropriate Facial Reaction Generation and Online Multiple Appropriate Facial Reaction Generation, respectively. The challenge baseline code is publicly available at \url{https://github.com/reactmultimodalchallenge/baseline_react2023}.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
Towards Automated COVID-19 Presence and Severity Classification
Authors:
Dominik Müller,
Niklas Schröter,
Silvan Mertes,
Fabio Hellmann,
Miriam Elia,
Wolfgang Reif,
Bernhard Bauer,
Elisabeth André,
Frank Kramer
Abstract:
COVID-19 presence classification and severity prediction via (3D) thorax computed tomography scans have become important tasks in recent times. Especially for capacity planning of intensive care units, predicting the future severity of a COVID-19 patient is crucial. The presented approach follows state-of-theart techniques to aid medical professionals in these situations. It comprises an ensemble…
▽ More
COVID-19 presence classification and severity prediction via (3D) thorax computed tomography scans have become important tasks in recent times. Especially for capacity planning of intensive care units, predicting the future severity of a COVID-19 patient is crucial. The presented approach follows state-of-theart techniques to aid medical professionals in these situations. It comprises an ensemble learning strategy via 5-fold cross-validation that includes transfer learning and combines pre-trained 3D-versions of ResNet34 and DenseNet121 for COVID19 classification and severity prediction respectively. Further, domain-specific preprocessing was applied to optimize model performance. In addition, medical information like the infection-lung-ratio, patient age, and sex were included. The presented model achieves an AUC of 79.0% to predict COVID-19 severity, and 83.7% AUC to classify the presence of an infection, which is comparable with other currently popular methods. This approach is implemented using the AUCMEDI framework and relies on well-known network architectures to ensure robustness and reproducibility.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Parameterized Verification of Disjunctive Timed Networks
Authors:
Étienne André,
Paul Eichler,
Swen Jacobs,
Shyam Lal Karra
Abstract:
We introduce new techniques for the parameterized verification of disjunctive timed networks (DTNs), i.e., networks of timed automata (TAs) that communicate via location guards that enable a transition only if there is another process in a given location. This computational model has been considered in the literature before, example applications are gossiping clock synchronization protocols or pla…
▽ More
We introduce new techniques for the parameterized verification of disjunctive timed networks (DTNs), i.e., networks of timed automata (TAs) that communicate via location guards that enable a transition only if there is another process in a given location. This computational model has been considered in the literature before, example applications are gossiping clock synchronization protocols or planning problems. We address the minimum-time reachability problem (Minreach) in DTNs, and show how to efficiently solve it based on a novel zone graph algorithm. We further show that solving Minreach allows us to construct a summary TA capturing exactly the possible behaviors of a single TA within a DTN of arbitrary size. The combination of these two results enables the parameterized verification of DTNs, while avoiding the construction of an exponential-size cutoff system required by existing results. Additionally, we develop sufficient conditions for solving Minreach and parameterized verification problems even in certain cases where locations that appear in location guards can have clock invariants, a case that has usually been excluded in the literature. Our techniques are also implemented, and experiments show their practicality.
△ Less
Submitted 2 January, 2024; v1 submitted 12 May, 2023;
originally announced May 2023.
-
GANonymization: A GAN-based Face Anonymization Framework for Preserving Emotional Expressions
Authors:
Fabio Hellmann,
Silvan Mertes,
Mohamed Benouis,
Alexander Hustinx,
Tzung-Chien Hsieh,
Cristina Conati,
Peter Krawitz,
Elisabeth André
Abstract:
In recent years, the increasing availability of personal data has raised concerns regarding privacy and security. One of the critical processes to address these concerns is data anonymization, which aims to protect individual privacy and prevent the release of sensitive information. This research focuses on the importance of face anonymization. Therefore, we introduce GANonymization, a novel face…
▽ More
In recent years, the increasing availability of personal data has raised concerns regarding privacy and security. One of the critical processes to address these concerns is data anonymization, which aims to protect individual privacy and prevent the release of sensitive information. This research focuses on the importance of face anonymization. Therefore, we introduce GANonymization, a novel face anonymization framework with facial expression-preserving abilities. Our approach is based on a high-level representation of a face, which is synthesized into an anonymized version based on a generative adversarial network (GAN). The effectiveness of the approach was assessed by evaluating its performance in removing identifiable facial attributes to increase the anonymity of the given individual face. Additionally, the performance of preserving facial expressions was evaluated on several affect recognition datasets and outperformed the state-of-the-art methods in most categories. Finally, our approach was analyzed for its ability to remove various facial traits, such as jewelry, hair color, and multiple others. Here, it demonstrated reliable performance in removing these attributes. Our results suggest that GANonymization is a promising approach for anonymizing faces while preserving facial expressions.
△ Less
Submitted 14 November, 2023; v1 submitted 3 May, 2023;
originally announced May 2023.
-
Gaze-based Attention Recognition for Human-Robot Collaboration
Authors:
Pooja Prajod,
Matteo Lavit Nicora,
Matteo Malosio,
Elisabeth André
Abstract:
Attention (and distraction) recognition is a key factor in improving human-robot collaboration. We present an assembly scenario where a human operator and a cobot collaborate equally to piece together a gearbox. The setup provides multiple opportunities for the cobot to adapt its behavior depending on the operator's attention, which can improve the collaboration experience and reduce psychological…
▽ More
Attention (and distraction) recognition is a key factor in improving human-robot collaboration. We present an assembly scenario where a human operator and a cobot collaborate equally to piece together a gearbox. The setup provides multiple opportunities for the cobot to adapt its behavior depending on the operator's attention, which can improve the collaboration experience and reduce psychological strain. As a first step, we recognize the areas in the workspace that the human operator is paying attention to, and consequently, detect when the operator is distracted. We propose a novel deep-learning approach to develop an attention recognition model. First, we train a convolutional neural network to estimate the gaze direction using a publicly available image dataset. Then, we use transfer learning with a small dataset to map the gaze direction onto pre-defined areas of interest. Models trained using this approach performed very well in leave-one-subject-out evaluation on the small dataset. We performed an additional validation of our models using the video snippets collected from participants working as an operator in the presented assembly scenario. Although the recall for the Distracted class was lower in this case, the models performed well in recognizing the areas the operator paid attention to. To the best of our knowledge, this is the first work that validated an attention recognition model using data from a setting that mimics industrial human-robot collaboration. Our findings highlight the need for validation of attention recognition solutions in such full-fledged, non-guided scenarios.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
ForDigitStress: A multi-modal stress dataset employing a digital job interview scenario
Authors:
Alexander Heimerl,
Pooja Prajod,
Silvan Mertes,
Tobias Baur,
Matthias Kraus,
Ailin Liu,
Helen Risack,
Nicolas Rohleder,
Elisabeth André,
Linda Becker
Abstract:
We present a multi-modal stress dataset that uses digital job interviews to induce stress. The dataset provides multi-modal data of 40 participants including audio, video (motion capturing, facial recognition, eye tracking) as well as physiological information (photoplethysmography, electrodermal activity). In addition to that, the dataset contains time-continuous annotations for stress and occurr…
▽ More
We present a multi-modal stress dataset that uses digital job interviews to induce stress. The dataset provides multi-modal data of 40 participants including audio, video (motion capturing, facial recognition, eye tracking) as well as physiological information (photoplethysmography, electrodermal activity). In addition to that, the dataset contains time-continuous annotations for stress and occurred emotions (e.g. shame, anger, anxiety, surprise). In order to establish a baseline, five different machine learning classifiers (Support Vector Machine, K-Nearest Neighbors, Random Forest, Long-Short-Term Memory Network) have been trained and evaluated on the proposed dataset for a binary stress classification task. The best-performing classifier achieved an accuracy of 88.3% and an F1-score of 87.5%.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
GANterfactual-RL: Understanding Reinforcement Learning Agents' Strategies through Visual Counterfactual Explanations
Authors:
Tobias Huber,
Maximilian Demmler,
Silvan Mertes,
Matthew L. Olson,
Elisabeth André
Abstract:
Counterfactual explanations are a common tool to explain artificial intelligence models. For Reinforcement Learning (RL) agents, they answer "Why not?" or "What if?" questions by illustrating what minimal change to a state is needed such that an agent chooses a different action. Generating counterfactual explanations for RL agents with visual input is especially challenging because of their large…
▽ More
Counterfactual explanations are a common tool to explain artificial intelligence models. For Reinforcement Learning (RL) agents, they answer "Why not?" or "What if?" questions by illustrating what minimal change to a state is needed such that an agent chooses a different action. Generating counterfactual explanations for RL agents with visual input is especially challenging because of their large state spaces and because their decisions are part of an overarching policy, which includes long-term decision-making. However, research focusing on counterfactual explanations, specifically for RL agents with visual input, is scarce and does not go beyond identifying defective agents. It is unclear whether counterfactual explanations are still helpful for more complex tasks like analyzing the learned strategies of different agents or choosing a fitting agent for a specific task. We propose a novel but simple method to generate counterfactual explanations for RL agents by formulating the problem as a domain transfer problem which allows the use of adversarial learning techniques like StarGAN. Our method is fully model-agnostic and we demonstrate that it outperforms the only previous method in several computational metrics. Furthermore, we show in a user study that our method performs best when analyzing which strategies different agents pursue.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
Around the world in 60 words: A generative vocabulary test for online research
Authors:
Pol van Rijn,
Yue Sun,
Harin Lee,
Raja Marjieh,
Ilia Sucholutsky,
Francesca Lanzarini,
Elisabeth André,
Nori Jacoby
Abstract:
Conducting experiments with diverse participants in their native languages can uncover insights into culture, cognition, and language that may not be revealed otherwise. However, conducting these experiments online makes it difficult to validate self-reported language proficiency. Furthermore, existing proficiency tests are small and cover only a few languages. We present an automated pipeline to…
▽ More
Conducting experiments with diverse participants in their native languages can uncover insights into culture, cognition, and language that may not be revealed otherwise. However, conducting these experiments online makes it difficult to validate self-reported language proficiency. Furthermore, existing proficiency tests are small and cover only a few languages. We present an automated pipeline to generate vocabulary tests using text from Wikipedia. Our pipeline samples rare nouns and creates pseudowords with the same low-level statistics. Six behavioral experiments (N=236) in six countries and eight languages show that (a) our test can distinguish between native speakers of closely related languages, (b) the test is reliable ($r=0.82$), and (c) performance strongly correlates with existing tests (LexTale) and self-reports. We further show that test accuracy is negatively correlated with the linguistic distance between the tested and the native language. Our test, available in eight languages, can easily be extended to other languages.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.