-
Automatic Depression Assessment using Machine Learning: A Comprehensive Survey
Authors:
Siyang Song,
Yupeng Huo,
Shiqing Tang,
Jiaee Cheong,
Rui Gao,
Michel Valstar,
Hatice Gunes
Abstract:
Depression is a common mental illness across current human society. Traditional depression assessment relying on inventories and interviews with psychologists frequently suffer from subjective diagnosis results, slow and expensive diagnosis process as well as lack of human resources. Since there is a solid evidence that depression is reflected by various human internal brain activities and externa…
▽ More
Depression is a common mental illness across current human society. Traditional depression assessment relying on inventories and interviews with psychologists frequently suffer from subjective diagnosis results, slow and expensive diagnosis process as well as lack of human resources. Since there is a solid evidence that depression is reflected by various human internal brain activities and external expressive behaviours, early traditional machine learning (ML) and advanced deep learning (DL) models have been widely explored for human behaviour-based automatic depression assessment (ADA) since 2012. However, recent ADA surveys typically only focus on a limited number of human behaviour modalities. Despite being used as a theoretical basis for developing ADA approaches, existing ADA surveys lack a comprehensive review and summary of multi-modal depression-related human behaviours. To bridge this gap, this paper specifically summarises depression-related human behaviours across a range of modalities (e.g. the human brain, verbal language and non-verbal audio/facial/body behaviours). We focus on conducting an up-to-date and comprehensive survey of ML-based ADA approaches for learning depression cues from these behaviours as well as discussing and comparing their distinctive features and limitations. In addition, we also review existing ADA competitions and datasets, identify and discuss the main challenges and opportunities to provide further research directions for future ADA researchers.
△ Less
Submitted 29 June, 2025; v1 submitted 9 June, 2025;
originally announced June 2025.
-
Do We Talk to Robots Like Therapists, and Do They Respond Accordingly? Language Alignment in AI Emotional Support
Authors:
Sophie Chiang,
Guy Laban,
Hatice Gunes
Abstract:
As conversational agents increasingly engage in emotionally supportive dialogue, it is important to understand how closely their interactions resemble those in traditional therapy settings. This study investigates whether the concerns shared with a robot align with those shared in human-to-human (H2H) therapy sessions, and whether robot responses semantically mirror those of human therapists. We a…
▽ More
As conversational agents increasingly engage in emotionally supportive dialogue, it is important to understand how closely their interactions resemble those in traditional therapy settings. This study investigates whether the concerns shared with a robot align with those shared in human-to-human (H2H) therapy sessions, and whether robot responses semantically mirror those of human therapists. We analyzed two datasets: one of interactions between users and professional therapists (Hugging Face's NLP Mental Health Conversations), and another involving supportive conversations with a social robot (QTrobot from LuxAI) powered by a large language model (LLM, GPT-3.5). Using sentence embeddings and K-means clustering, we assessed cross-agent thematic alignment by applying a distance-based cluster-fitting method that evaluates whether responses from one agent type map to clusters derived from the other, and validated it using Euclidean distances. Results showed that 90.88% of robot conversation disclosures could be mapped to clusters from the human therapy dataset, suggesting shared topical structure. For matched clusters, we compared the subjects as well as therapist and robot responses using Transformer, Word2Vec, and BERT embeddings, revealing strong semantic overlap in subjects' disclosures in both datasets, as well as in the responses given to similar human disclosure themes across agent types (robot vs. human therapist). These findings highlight both the parallels and boundaries of robot-led support conversations and their potential for augmenting mental health interventions.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Critical Insights about Robots for Mental Wellbeing
Authors:
Guy Laban,
Micol Spitale,
Minja Axelsson,
Nida Itrat Abbasi,
Hatice Gunes
Abstract:
Social robots are increasingly being explored as tools to support emotional wellbeing, particularly in non-clinical settings. Drawing on a range of empirical studies and practical deployments, this paper outlines six key insights that highlight both the opportunities and challenges in using robots to promote mental wellbeing. These include (1) the lack of a single, objective measure of wellbeing,…
▽ More
Social robots are increasingly being explored as tools to support emotional wellbeing, particularly in non-clinical settings. Drawing on a range of empirical studies and practical deployments, this paper outlines six key insights that highlight both the opportunities and challenges in using robots to promote mental wellbeing. These include (1) the lack of a single, objective measure of wellbeing, (2) the fact that robots don't need to act as companions to be effective, (3) the growing potential of virtual interactions, (4) the importance of involving clinicians in the design process, (5) the difference between one-off and long-term interactions, and (6) the idea that adaptation and personalization are not always necessary for positive outcomes. Rather than positioning robots as replacements for human therapists, we argue that they are best understood as supportive tools that must be designed with care, grounded in evidence, and shaped by ethical and psychological considerations. Our aim is to inform future research and guide responsible, effective use of robots in mental health and wellbeing contexts.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Gender Fairness of Machine Learning Algorithms for Pain Detection
Authors:
Dylan Green,
Yuting Shang,
Jiaee Cheong,
Yang Liu,
Hatice Gunes
Abstract:
Automated pain detection through machine learning (ML) and deep learning (DL) algorithms holds significant potential in healthcare, particularly for patients unable to self-report pain levels. However, the accuracy and fairness of these algorithms across different demographic groups (e.g., gender) remain under-researched. This paper investigates the gender fairness of ML and DL models trained on t…
▽ More
Automated pain detection through machine learning (ML) and deep learning (DL) algorithms holds significant potential in healthcare, particularly for patients unable to self-report pain levels. However, the accuracy and fairness of these algorithms across different demographic groups (e.g., gender) remain under-researched. This paper investigates the gender fairness of ML and DL models trained on the UNBC-McMaster Shoulder Pain Expression Archive Database, evaluating the performance of various models in detecting pain based solely on the visual modality of participants' facial expressions. We compare traditional ML algorithms, Linear Support Vector Machine (L SVM) and Radial Basis Function SVM (RBF SVM), with DL methods, Convolutional Neural Network (CNN) and Vision Transformer (ViT), using a range of performance and fairness metrics. While ViT achieved the highest accuracy and a selection of fairness metrics, all models exhibited gender-based biases. These findings highlight the persistent trade-off between accuracy and fairness, emphasising the need for fairness-aware techniques to mitigate biases in automated healthcare systems.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
FG 2025 TrustFAA: the First Workshop on Towards Trustworthy Facial Affect Analysis: Advancing Insights of Fairness, Explainability, and Safety (TrustFAA)
Authors:
Jiaee Cheong,
Yang Liu,
Harold Soh,
Hatice Gunes
Abstract:
With the increasing prevalence and deployment of Emotion AI-powered facial affect analysis (FAA) tools, concerns about the trustworthiness of these systems have become more prominent. This first workshop on "Towards Trustworthy Facial Affect Analysis: Advancing Insights of Fairness, Explainability, and Safety (TrustFAA)" aims to bring together researchers who are investigating different challenges…
▽ More
With the increasing prevalence and deployment of Emotion AI-powered facial affect analysis (FAA) tools, concerns about the trustworthiness of these systems have become more prominent. This first workshop on "Towards Trustworthy Facial Affect Analysis: Advancing Insights of Fairness, Explainability, and Safety (TrustFAA)" aims to bring together researchers who are investigating different challenges in relation to trustworthiness-such as interpretability, uncertainty, biases, and privacy-across various facial affect analysis tasks, including macro/ micro-expression recognition, facial action unit detection, other corresponding applications such as pain and depression detection, as well as human-robot interaction and collaboration. In alignment with FG2025's emphasis on ethics, as demonstrated by the inclusion of an Ethical Impact Statement requirement for this year's submissions, this workshop supports FG2025's efforts by encouraging research, discussion and dialogue on trustworthy FAA.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
GraphAU-Pain: Graph-based Action Unit Representation for Pain Intensity Estimation
Authors:
Zhiyu Wang,
Yang Liu,
Hatice Gunes
Abstract:
Understanding pain-related facial behaviors is essential for digital healthcare in terms of effective monitoring, assisted diagnostics, and treatment planning, particularly for patients unable to communicate verbally. Existing data-driven methods of detecting pain from facial expressions are limited due to interpretability and severity quantification. To this end, we propose GraphAU-Pain, leveragi…
▽ More
Understanding pain-related facial behaviors is essential for digital healthcare in terms of effective monitoring, assisted diagnostics, and treatment planning, particularly for patients unable to communicate verbally. Existing data-driven methods of detecting pain from facial expressions are limited due to interpretability and severity quantification. To this end, we propose GraphAU-Pain, leveraging a graph-based framework to model facial Action Units (AUs) and their interrelationships for pain intensity estimation. AUs are represented as graph nodes, with co-occurrence relationships as edges, enabling a more expressive depiction of pain-related facial behaviors. By utilizing a relational graph neural network, our framework offers improved interpretability and significant performance gains. Experiments conducted on the publicly available UNBC dataset demonstrate the effectiveness of the GraphAU-Pain, achieving an F1-score of 66.21% and accuracy of 87.61% in pain intensity estimation.
△ Less
Submitted 17 June, 2025; v1 submitted 26 May, 2025;
originally announced May 2025.
-
REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge
Authors:
Siyang Song,
Micol Spitale,
Xiangyu Kong,
Hengde Zhu,
Cheng Luo,
Cristina Palmero,
German Barquero,
Sergio Escalera,
Michel Valstar,
Mohamed Daoudi,
Tobias Baur,
Fabien Ringeval,
Andrew Howes,
Elisabeth Andre,
Hatice Gunes
Abstract:
In dyadic interactions, a broad spectrum of human facial reactions might be appropriate for responding to each human speaker behaviour. Following the successful organisation of the REACT 2023 and REACT 2024 challenges, we are proposing the REACT 2025 challenge encouraging the development and benchmarking of Machine Learning (ML) models that can be used to generate multiple appropriate, diverse, re…
▽ More
In dyadic interactions, a broad spectrum of human facial reactions might be appropriate for responding to each human speaker behaviour. Following the successful organisation of the REACT 2023 and REACT 2024 challenges, we are proposing the REACT 2025 challenge encouraging the development and benchmarking of Machine Learning (ML) models that can be used to generate multiple appropriate, diverse, realistic and synchronised human-style facial reactions expressed by human listeners in response to an input stimulus (i.e., audio-visual behaviours expressed by their corresponding speakers). As a key of the challenge, we provide challenge participants with the first natural and large-scale multi-modal MAFRG dataset (called MARS) recording 137 human-human dyadic interactions containing a total of 2856 interaction sessions covering five different topics. In addition, this paper also presents the challenge guidelines and the performance of our baselines on the two proposed sub-challenges: Offline MAFRG and Online MAFRG, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline_react2025
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Some Optimizers are More Equal: Understanding the Role of Optimizers in Group Fairness
Authors:
Mojtaba Kolahdouzi,
Hatice Gunes,
Ali Etemad
Abstract:
We study whether and how the choice of optimization algorithm can impact group fairness in deep neural networks. Through stochastic differential equation analysis of optimization dynamics in an analytically tractable setup, we demonstrate that the choice of optimization algorithm indeed influences fairness outcomes, particularly under severe imbalance. Furthermore, we show that when comparing two…
▽ More
We study whether and how the choice of optimization algorithm can impact group fairness in deep neural networks. Through stochastic differential equation analysis of optimization dynamics in an analytically tractable setup, we demonstrate that the choice of optimization algorithm indeed influences fairness outcomes, particularly under severe imbalance. Furthermore, we show that when comparing two categories of optimizers, adaptive methods and stochastic methods, RMSProp (from the adaptive category) has a higher likelihood of converging to fairer minima than SGD (from the stochastic category). Building on this insight, we derive two new theoretical guarantees showing that, under appropriate conditions, RMSProp exhibits fairer parameter updates and improved fairness in a single optimization step compared to SGD. We then validate these findings through extensive experiments on three publicly available datasets, namely CelebA, FairFace, and MS-COCO, across different tasks as facial expression recognition, gender classification, and multi-label classification, using various backbones. Considering multiple fairness definitions including equalized odds, equal opportunity, and demographic parity, adaptive optimizers like RMSProp and Adam consistently outperform SGD in terms of group fairness, while maintaining comparable predictive accuracy. Our results highlight the role of adaptive updates as a crucial yet overlooked mechanism for promoting fair outcomes.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
Comparing Self-Disclosure Themes and Semantics to a Human, a Robot, and a Disembodied Agent
Authors:
Sophie Chiang,
Guy Laban,
Emily S. Cross,
Hatice Gunes
Abstract:
As social robots and other artificial agents become more conversationally capable, it is important to understand whether the content and meaning of self-disclosure towards these agents changes depending on the agent's embodiment. In this study, we analysed conversational data from three controlled experiments in which participants self-disclosed to a human, a humanoid social robot, and a disembodi…
▽ More
As social robots and other artificial agents become more conversationally capable, it is important to understand whether the content and meaning of self-disclosure towards these agents changes depending on the agent's embodiment. In this study, we analysed conversational data from three controlled experiments in which participants self-disclosed to a human, a humanoid social robot, and a disembodied conversational agent. Using sentence embeddings and clustering, we identified themes in participants' disclosures, which were then labelled and explained by a large language model. We subsequently assessed whether these themes and the underlying semantic structure of the disclosures varied by agent embodiment. Our findings reveal strong consistency: thematic distributions did not significantly differ across embodiments, and semantic similarity analyses showed that disclosures were expressed in highly comparable ways. These results suggest that while embodiment may influence human behaviour in human-robot and human-agent interactions, people tend to maintain a consistent thematic focus and semantic structure in their disclosures, whether speaking to humans or artificial interlocutors.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification
Authors:
Wang Tang,
Fethiye Irmak Dogan,
Linbo Qing,
Hatice Gunes
Abstract:
Dyadic social relationships, which refer to relationships between two individuals who know each other through repeated interactions (or not), are shaped by shared spatial and temporal experiences. Current computational methods for modeling these relationships face three major challenges: (1) the failure to model asymmetric relationships, e.g., one individual may perceive the other as a friend whil…
▽ More
Dyadic social relationships, which refer to relationships between two individuals who know each other through repeated interactions (or not), are shaped by shared spatial and temporal experiences. Current computational methods for modeling these relationships face three major challenges: (1) the failure to model asymmetric relationships, e.g., one individual may perceive the other as a friend while the other perceives them as an acquaintance, (2) the disruption of continuous interactions by discrete frame sampling, which segments the temporal continuity of interaction in real-world scenarios, and (3) the limitation to consider periodic behavioral cues, such as rhythmic vocalizations or recurrent gestures, which are crucial for inferring the evolution of dyadic relationships. To address these challenges, we propose AsyReC, a multimodal graph-based framework for asymmetric dyadic relationship classification, with three core innovations: (i) a triplet graph neural network with node-edge dual attention that dynamically weights multimodal cues to capture interaction asymmetries (addressing challenge 1); (ii) a clip-level relationship learning architecture that preserves temporal continuity, enabling fine-grained modeling of real-world interaction dynamics (addressing challenge 2); and (iii) a periodic temporal encoder that projects time indices onto sine/cosine waveforms to model recurrent behavioral patterns (addressing challenge 3). Extensive experiments on two public datasets demonstrate state-of-the-art performance, while ablation studies validate the critical role of asymmetric interaction modeling and periodic temporal encoding in improving the robustness of dyadic relationship classification in real-world scenarios. Our code is publicly available at: https://github.com/tw-repository/AsyReC.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
What People Share With a Robot When Feeling Lonely and Stressed and How It Helps Over Time
Authors:
Guy Laban,
Sophie Chiang,
Hatice Gunes
Abstract:
Loneliness and stress are prevalent among young adults and are linked to significant psychological and health-related consequences. Social robots may offer a promising avenue for emotional support, especially when considering the ongoing advancements in conversational AI. This study investigates how repeated interactions with a social robot influence feelings of loneliness and perceived stress, an…
▽ More
Loneliness and stress are prevalent among young adults and are linked to significant psychological and health-related consequences. Social robots may offer a promising avenue for emotional support, especially when considering the ongoing advancements in conversational AI. This study investigates how repeated interactions with a social robot influence feelings of loneliness and perceived stress, and how such feelings are reflected in the themes of user disclosures towards the robot. Participants engaged in a five-session robot-led intervention, where a large language model powered QTrobot facilitated structured conversations designed to support cognitive reappraisal. Results from linear mixed-effects models show significant reductions in both loneliness and perceived stress over time. Additionally, semantic clustering of 560 user disclosures towards the robot revealed six distinct conversational themes. Results from a Kruskal-Wallis H-test demonstrate that participants reporting higher loneliness and stress more frequently engaged in socially focused disclosures, such as friendship and connection, whereas lower distress was associated with introspective and goal-oriented themes (e.g., academic ambitions). By exploring both how the intervention affects well-being, as well as how well-being shapes the content of robot-directed conversations, we aim to capture the dynamic nature of emotional support in huma-robot interaction.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
Robot-Led Vision Language Model Wellbeing Assessment of Children
Authors:
Nida Itrat Abbasi,
Fethiye Irmak Dogan,
Guy Laban,
Joanna Anderson,
Tamsin Ford,
Peter B. Jones,
Hatice Gunes
Abstract:
This study presents a novel robot-led approach to assessing children's mental wellbeing using a Vision Language Model (VLM). Inspired by the Child Apperception Test (CAT), the social robot NAO presented children with pictorial stimuli to elicit their verbal narratives of the images, which were then evaluated by a VLM in accordance with CAT assessment guidelines. The VLM's assessments were systemat…
▽ More
This study presents a novel robot-led approach to assessing children's mental wellbeing using a Vision Language Model (VLM). Inspired by the Child Apperception Test (CAT), the social robot NAO presented children with pictorial stimuli to elicit their verbal narratives of the images, which were then evaluated by a VLM in accordance with CAT assessment guidelines. The VLM's assessments were systematically compared to those provided by a trained psychologist. The results reveal that while the VLM demonstrates moderate reliability in identifying cases with no wellbeing concerns, its ability to accurately classify assessments with clinical concern remains limited. Moreover, although the model's performance was generally consistent when prompted with varying demographic factors such as age and gender, a significantly higher false positive rate was observed for girls, indicating potential sensitivity to gender attribute. These findings highlight both the promise and the challenges of integrating VLMs into robot-led assessments of children's wellbeing.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
A Robot-Led Intervention for Emotion Regulation: From Expression to Reappraisal
Authors:
Guy Laban,
Julie Wang,
Hatice Gunes
Abstract:
Emotion regulation is a crucial skill for managing emotions in everyday life, yet finding a constructive and accessible method to support these processes remains challenging due to their cognitive demands. In this study, we explore how regular interactions with a social robot, conducted in a structured yet familiar environment within university halls and departments, can provide effective support…
▽ More
Emotion regulation is a crucial skill for managing emotions in everyday life, yet finding a constructive and accessible method to support these processes remains challenging due to their cognitive demands. In this study, we explore how regular interactions with a social robot, conducted in a structured yet familiar environment within university halls and departments, can provide effective support for emotion regulation through cognitive reappraisal. Twenty-one students participated in a five-session study at a university hall or department, where the robot, powered by a large language model (GPT-3.5), facilitated structured conversations, encouraging the students to reinterpret emotionally charged situations they shared with the robot. Quantitative and qualitative results indicate significant improvements in emotion self-regulation, with participants reporting better understanding and control of their emotions. The intervention led to significant changes in constructive emotion regulation tendencies and positive effects on mood and sentiment after each session. The findings also demonstrate that repeated interactions with the robot encouraged greater emotional expressiveness, including longer speech disclosures, increased use of affective language, and heightened facial arousal. Notably, expressiveness followed structured patterns aligned with the reappraisal process, with expression peaking during key reappraisal moments, particularly when participants were prompted to reinterpret negative experiences. The qualitative feedback further highlighted how the robot fostered introspection and provided a supportive space for discussing emotions, enabling participants to confront long-avoided emotional challenges. These findings demonstrate the potential of robots to effectively assist in emotion regulation in familiar environments, offering both emotional support and cognitive guidance.
△ Less
Submitted 30 June, 2025; v1 submitted 23 March, 2025;
originally announced March 2025.
-
Stakeholder Perspectives on Whether and How Social Robots Can Support Mediation and Advocacy for Higher Education Students with Disabilities
Authors:
Alva Markelius,
Julie Bailey,
Jenny L. Gibson,
Hatice Gunes
Abstract:
This paper presents an iterative, participatory, empirical study that examines the potential of using artificial intelligence, such as social robots and large language models, to support mediation and advocacy for students with disabilities in higher education. Drawing on qualitative data from interviews and focus groups conducted with various stakeholders, including disabled students, disabled st…
▽ More
This paper presents an iterative, participatory, empirical study that examines the potential of using artificial intelligence, such as social robots and large language models, to support mediation and advocacy for students with disabilities in higher education. Drawing on qualitative data from interviews and focus groups conducted with various stakeholders, including disabled students, disabled student representatives, and disability practitioners at the University of Cambridge, this study reports findings relating to understanding the problem space, ideating robotic support and participatory co-design of advocacy support robots. The findings highlight the potential of these technologies in providing signposting and acting as a sounding board or study companion, while also addressing limitations in empathic understanding, trust, equity, and accessibility. We discuss ethical considerations, including intersectional biases, the double empathy problem, and the implications of deploying social robots in contexts shaped by structural inequalities. Finally, we offer a set of recommendations and suggestions for future research, rethinking the notion of corrective technological interventions to tools that empower and amplify self-advocacy.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Exploring Causality for HRI: A Case Study on Robotic Mental Well-being Coaching
Authors:
Micol Spitale,
Srikar Babu,
Serhan Cakmak,
Jiaee Cheong,
Hatice Gunes
Abstract:
One of the primary goals of Human-Robot Interaction (HRI) research is to develop robots that can interpret human behavior and adapt their responses accordingly. Adaptive learning models, such as continual and reinforcement learning, play a crucial role in improving robots' ability to interact effectively in real-world settings. However, these models face significant challenges due to the limited a…
▽ More
One of the primary goals of Human-Robot Interaction (HRI) research is to develop robots that can interpret human behavior and adapt their responses accordingly. Adaptive learning models, such as continual and reinforcement learning, play a crucial role in improving robots' ability to interact effectively in real-world settings. However, these models face significant challenges due to the limited availability of real-world data, particularly in sensitive domains like healthcare and well-being. This data scarcity can hinder a robot's ability to adapt to new situations. To address these challenges, causality provides a structured framework for understanding and modeling the underlying relationships between actions, events, and outcomes. By moving beyond mere pattern recognition, causality enables robots to make more explainable and generalizable decisions. This paper presents an exploratory causality-based analysis through a case study of an adaptive robotic coach delivering positive psychology exercises over four weeks in a workplace setting. The robotic coach autonomously adapts to multimodal human behaviors, such as facial valence and speech duration. By conducting both macro- and micro-level causal analyses, this study aims to gain deeper insights into how adaptability can enhance well-being during interactions. Ultimately, this research seeks to advance our understanding of how causality can help overcome challenges in HRI, particularly in real-world applications.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
Beyond Vision: How Large Language Models Interpret Facial Expressions from Valence-Arousal Values
Authors:
Vaibhav Mehra,
Guy Laban,
Hatice Gunes
Abstract:
Large Language Models primarily operate through text-based inputs and outputs, yet human emotion is communicated through both verbal and non-verbal cues, including facial expressions. While Vision-Language Models analyze facial expressions from images, they are resource-intensive and may depend more on linguistic priors than visual understanding. To address this, this study investigates whether LL…
▽ More
Large Language Models primarily operate through text-based inputs and outputs, yet human emotion is communicated through both verbal and non-verbal cues, including facial expressions. While Vision-Language Models analyze facial expressions from images, they are resource-intensive and may depend more on linguistic priors than visual understanding. To address this, this study investigates whether LLMs can infer affective meaning from dimensions of facial expressions-Valence and Arousal values, structured numerical representations, rather than using raw visual input. VA values were extracted using Facechannel from images of facial expressions and provided to LLMs in two tasks: (1) categorizing facial expressions into basic (on the IIMI dataset) and complex emotions (on the Emotic dataset) and (2) generating semantic descriptions of facial expressions (on the Emotic dataset). Results from the categorization task indicate that LLMs struggle to classify VA values into discrete emotion categories, particularly for emotions beyond basic polarities (e.g., happiness, sadness). However, in the semantic description task, LLMs produced textual descriptions that align closely with human-generated interpretations, demonstrating a stronger capacity for free text affective inference of facial expressions.
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
Machine Learning Fairness for Depression Detection using EEG Data
Authors:
Angus Man Ho Kwok,
Jiaee Cheong,
Sinan Kalkan,
Hatice Gunes
Abstract:
This paper presents the very first attempt to evaluate machine learning fairness for depression detection using electroencephalogram (EEG) data. We conduct experiments using different deep learning architectures such as Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks across three EEG datasets: Mumtaz, MODMA and Rest. We employ fi…
▽ More
This paper presents the very first attempt to evaluate machine learning fairness for depression detection using electroencephalogram (EEG) data. We conduct experiments using different deep learning architectures such as Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks across three EEG datasets: Mumtaz, MODMA and Rest. We employ five different bias mitigation strategies at the pre-, in- and post-processing stages and evaluate their effectiveness. Our experimental results show that bias exists in existing EEG datasets and algorithms for depression detection, and different bias mitigation methods address bias at different levels across different fairness measures.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
U-Fair: Uncertainty-based Multimodal Multitask Learning for Fairer Depression Detection
Authors:
Jiaee Cheong,
Aditya Bangar,
Sinan Kalkan,
Hatice Gunes
Abstract:
Machine learning bias in mental health is becoming an increasingly pertinent challenge. Despite promising efforts indicating that multitask approaches often work better than unitask approaches, there is minimal work investigating the impact of multitask learning on performance and fairness in depression detection nor leveraged it to achieve fairer prediction outcomes. In this work, we undertake a…
▽ More
Machine learning bias in mental health is becoming an increasingly pertinent challenge. Despite promising efforts indicating that multitask approaches often work better than unitask approaches, there is minimal work investigating the impact of multitask learning on performance and fairness in depression detection nor leveraged it to achieve fairer prediction outcomes. In this work, we undertake a systematic investigation of using a multitask approach to improve performance and fairness for depression detection. We propose a novel gender-based task-reweighting method using uncertainty grounded in how the PHQ-8 questionnaire is structured. Our results indicate that, although a multitask approach improves performance and fairness compared to a unitask approach, the results are not always consistent and we see evidence of negative transfer and a reduction in the Pareto frontier, which is concerning given the high-stake healthcare setting. Our proposed approach of gender-based reweighting with uncertainty improves performance and fairness and alleviates both challenges to a certain extent. Our findings on each PHQ-8 subitem task difficulty are also in agreement with the largest study conducted on the PHQ-8 subitem discrimination capacity, thus providing the very first tangible evidence linking ML findings with large-scale empirical population studies conducted on the PHQ-8.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
GRACE: Generating Socially Appropriate Robot Actions Leveraging LLMs and Human Explanations
Authors:
Fethiye Irmak Dogan,
Umut Ozyurt,
Gizem Cinar,
Hatice Gunes
Abstract:
When operating in human environments, robots need to handle complex tasks while both adhering to social norms and accommodating individual preferences. For instance, based on common sense knowledge, a household robot can predict that it should avoid vacuuming during a social gathering, but it may still be uncertain whether it should vacuum before or after having guests. In such cases, integrating…
▽ More
When operating in human environments, robots need to handle complex tasks while both adhering to social norms and accommodating individual preferences. For instance, based on common sense knowledge, a household robot can predict that it should avoid vacuuming during a social gathering, but it may still be uncertain whether it should vacuum before or after having guests. In such cases, integrating common-sense knowledge with human preferences, often conveyed through human explanations, is fundamental yet a challenge for existing systems. In this paper, we introduce GRACE, a novel approach addressing this while generating socially appropriate robot actions. GRACE leverages common sense knowledge from LLMs, and it integrates this knowledge with human explanations through a generative network. The bidirectional structure of GRACE enables robots to refine and enhance LLM predictions by utilizing human explanations and makes robots capable of generating such explanations for human-specified actions. Our evaluations show that integrating human explanations boosts GRACE's performance, where it outperforms several baselines and provides sensible explanations.
△ Less
Submitted 3 April, 2025; v1 submitted 25 September, 2024;
originally announced September 2024.
-
Multimodal Gender Fairness in Depression Prediction: Insights on Data from the USA & China
Authors:
Joseph Cameron,
Jiaee Cheong,
Micol Spitale,
Hatice Gunes
Abstract:
Social agents and robots are increasingly being used in wellbeing settings. However, a key challenge is that these agents and robots typically rely on machine learning (ML) algorithms to detect and analyse an individual's mental wellbeing. The problem of bias and fairness in ML algorithms is becoming an increasingly greater source of concern. In concurrence, existing literature has also indicated…
▽ More
Social agents and robots are increasingly being used in wellbeing settings. However, a key challenge is that these agents and robots typically rely on machine learning (ML) algorithms to detect and analyse an individual's mental wellbeing. The problem of bias and fairness in ML algorithms is becoming an increasingly greater source of concern. In concurrence, existing literature has also indicated that mental health conditions can manifest differently across genders and cultures. We hypothesise that the representation of features (acoustic, textual, and visual) and their inter-modal relations would vary among subjects from different cultures and genders, thus impacting the performance and fairness of various ML models. We present the very first evaluation of multimodal gender fairness in depression manifestation by undertaking a study on two different datasets from the USA and China. We undertake thorough statistical and ML experimentation and repeat the experiments for several different algorithms to ensure that the results are not algorithm-dependent. Our findings indicate that though there are differences between both datasets, it is not conclusive whether this is due to the difference in depression manifestation as hypothesised or other external factors such as differences in data collection methodology. Our findings further motivate a call for a more consistent and culturally aware data collection process in order to address the problem of ML bias in depression detection and to promote the development of fairer agents and robots for wellbeing.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
ERR@HRI 2024 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Interactions
Authors:
Micol Spitale,
Maria Teresa Parreira,
Maia Stiber,
Minja Axelsson,
Neval Kara,
Garima Kankariya,
Chien-Ming Huang,
Malte Jung,
Wendy Ju,
Hatice Gunes
Abstract:
Despite the recent advancements in robotics and machine learning (ML), the deployment of autonomous robots in our everyday lives is still an open challenge. This is due to multiple reasons among which are their frequent mistakes, such as interrupting people or having delayed responses, as well as their limited ability to understand human speech, i.e., failure in tasks like transcribing speech to t…
▽ More
Despite the recent advancements in robotics and machine learning (ML), the deployment of autonomous robots in our everyday lives is still an open challenge. This is due to multiple reasons among which are their frequent mistakes, such as interrupting people or having delayed responses, as well as their limited ability to understand human speech, i.e., failure in tasks like transcribing speech to text. These mistakes may disrupt interactions and negatively influence human perception of these robots. To address this problem, robots need to have the ability to detect human-robot interaction (HRI) failures. The ERR@HRI 2024 challenge tackles this by offering a benchmark multimodal dataset of robot failures during human-robot interactions (HRI), encouraging researchers to develop and benchmark multimodal machine learning models to detect these failures. We created a dataset featuring multimodal non-verbal interaction data, including facial, speech, and pose features from video clips of interactions with a robotic coach, annotated with labels indicating the presence or absence of robot mistakes, user awkwardness, and interaction ruptures, allowing for the training and evaluation of predictive models. Challenge participants have been invited to submit their multimodal ML models for detection of robot errors and to be evaluated against various performance metrics such as accuracy, precision, recall, F1 score, with and without a margin of error reflecting the time-sensitivity of these metrics. The results of this challenge will help the research field in better understanding the robot failures in human-robot interactions and designing autonomous robots that can mitigate their own errors after successfully detecting them.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Past, Present, and Future: A Survey of The Evolution of Affective Robotics For Well-being
Authors:
Micol Spitale,
Minja Axelsson,
Sooyeon Jeong,
Paige Tuttosı,
Caitlin A. Stamatis,
Guy Laban,
Angelica Lim,
Hatice Gunes
Abstract:
Recent research in affective robots has recognized their potential in supporting human well-being. Due to rapidly developing affective and artificial intelligence technologies, this field of research has undergone explosive expansion and advancement in recent years. In order to develop a deeper understanding of recent advancements, we present a systematic review of the past 10 years of research in…
▽ More
Recent research in affective robots has recognized their potential in supporting human well-being. Due to rapidly developing affective and artificial intelligence technologies, this field of research has undergone explosive expansion and advancement in recent years. In order to develop a deeper understanding of recent advancements, we present a systematic review of the past 10 years of research in affective robotics for wellbeing. In this review, we identify the domains of well-being that have been studied, the methods used to investigate affective robots for well-being, and how these have evolved over time. We also examine the evolution of the multifaceted research topic from three lenses: technical, design, and ethical. Finally, we discuss future opportunities for research based on the gaps we have identified in our review -- proposing pathways to take affective robotics from the past and present to the future. The results of our review are of interest to human-robot interaction and affective computing researchers, as well as clinicians and well-being professionals who may wish to examine and incorporate affective robotics in their practices.
△ Less
Submitted 2 August, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Small but Fair! Fairness for Multimodal Human-Human and Robot-Human Mental Wellbeing Coaching
Authors:
Jiaee Cheong,
Micol Spitale,
Hatice Gunes
Abstract:
In recent years, the affective computing (AC) and human-robot interaction (HRI) research communities have put fairness at the centre of their research agenda. However, none of the existing work has addressed the problem of machine learning (ML) bias in HRI settings. In addition, many of the current datasets for AC and HRI are "small", making ML bias and debias analysis challenging. This paper pres…
▽ More
In recent years, the affective computing (AC) and human-robot interaction (HRI) research communities have put fairness at the centre of their research agenda. However, none of the existing work has addressed the problem of machine learning (ML) bias in HRI settings. In addition, many of the current datasets for AC and HRI are "small", making ML bias and debias analysis challenging. This paper presents the first work to explore ML bias analysis and mitigation of three small multimodal datasets collected within both a human-human and robot-human wellbeing coaching settings. The contributions of this work includes: i) being the first to explore the problem of ML bias and fairness within HRI settings; and ii) providing a multimodal analysis evaluated via modelling performance and fairness metrics across both high and low-level features and proposing a simple and effective data augmentation strategy (MixFeat) to debias the small datasets presented within this paper; and iii) conducting extensive experimentation and analyses to reveal ML fairness insights unique to AC and HRI research in order to distill a set of recommendations to aid AC and HRI researchers to be more engaged with fairness-aware ML-based research.
△ Less
Submitted 15 May, 2024;
originally announced July 2024.
-
LEXI: Large Language Models Experimentation Interface
Authors:
Guy Laban,
Tomer Laban,
Hatice Gunes
Abstract:
The recent developments in Large Language Models (LLM), mark a significant moment in the research and development of social interactions with artificial agents. These agents are widely deployed in a variety of settings, with potential impact on users. However, the study of social interactions with agents powered by LLM is still emerging, limited by access to the technology and to data, the absence…
▽ More
The recent developments in Large Language Models (LLM), mark a significant moment in the research and development of social interactions with artificial agents. These agents are widely deployed in a variety of settings, with potential impact on users. However, the study of social interactions with agents powered by LLM is still emerging, limited by access to the technology and to data, the absence of standardised interfaces, and challenges to establishing controlled experimental setups using the currently available business-oriented platforms. To answer these gaps, we developed LEXI, LLMs Experimentation Interface, an open-source tool enabling the deployment of artificial agents powered by LLM in social interaction behavioural experiments. Using a graphical interface, LEXI allows researchers to build agents, and deploy them in experimental setups along with forms and questionnaires while collecting interaction logs and self-reported data. The outcomes of usability testing indicate LEXI's broad utility, high usability and minimum mental workload requirement, with distinctive benefits observed across disciplines. A proof-of-concept study exploring the tool's efficacy in evaluating social HAIs was conducted, resulting in high-quality data. A comparison of empathetic versus neutral agents indicated that people perceive empathetic agents as more social, and write longer and more positive messages towards them.
△ Less
Submitted 2 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Graph in Graph Neural Network
Authors:
Jiongshu Wang,
Jing Yang,
Jiankang Deng,
Hatice Gunes,
Siyang Song
Abstract:
Existing Graph Neural Networks (GNNs) are limited to process graphs each of whose vertices is represented by a vector or a single value, limited their representing capability to describe complex objects. In this paper, we propose the first GNN (called Graph in Graph Neural (GIG) Network) which can process graph-style data (called GIG sample) whose vertices are further represented by graphs. Given…
▽ More
Existing Graph Neural Networks (GNNs) are limited to process graphs each of whose vertices is represented by a vector or a single value, limited their representing capability to describe complex objects. In this paper, we propose the first GNN (called Graph in Graph Neural (GIG) Network) which can process graph-style data (called GIG sample) whose vertices are further represented by graphs. Given a set of graphs or a data sample whose components can be represented by a set of graphs (called multi-graph data sample), our GIG network starts with a GIG sample generation (GSG) module which encodes the input as a \textbf{GIG sample}, where each GIG vertex includes a graph. Then, a set of GIG hidden layers are stacked, with each consisting of: (1) a GIG vertex-level updating (GVU) module that individually updates the graph in every GIG vertex based on its internal information; and (2) a global-level GIG sample updating (GGU) module that updates graphs in all GIG vertices based on their relationships, making the updated GIG vertices become global context-aware. This way, both internal cues within the graph contained in each GIG vertex and the relationships among GIG vertices could be utilized for down-stream tasks. Experimental results demonstrate that our GIG network generalizes well for not only various generic graph analysis tasks but also real-world multi-graph data analysis (e.g., human skeleton video-based action recognition), which achieved the new state-of-the-art results on 13 out of 14 evaluated datasets. Our code is publicly available at https://github.com/wangjs96/Graph-in-Graph-Neural-Network.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Underneath the Numbers: Quantitative and Qualitative Gender Fairness in LLMs for Depression Prediction
Authors:
Micol Spitale,
Jiaee Cheong,
Hatice Gunes
Abstract:
Recent studies show bias in many machine learning models for depression detection, but bias in LLMs for this task remains unexplored. This work presents the first attempt to investigate the degree of gender bias present in existing LLMs (ChatGPT, LLaMA 2, and Bard) using both quantitative and qualitative approaches. From our quantitative evaluation, we found that ChatGPT performs the best across v…
▽ More
Recent studies show bias in many machine learning models for depression detection, but bias in LLMs for this task remains unexplored. This work presents the first attempt to investigate the degree of gender bias present in existing LLMs (ChatGPT, LLaMA 2, and Bard) using both quantitative and qualitative approaches. From our quantitative evaluation, we found that ChatGPT performs the best across various performance metrics and LLaMA 2 outperforms other LLMs in terms of group fairness metrics. As qualitative fairness evaluation remains an open research question we propose several strategies (e.g., word count, thematic analysis) to investigate whether and how a qualitative evaluation can provide valuable insights for bias analysis beyond what is possible with quantitative evaluation. We found that ChatGPT consistently provides a more comprehensive, well-reasoned explanation for its prediction compared to LLaMA 2. We have also identified several themes adopted by LLMs to qualitatively evaluate gender fairness. We hope our results can be used as a stepping stone towards future attempts at improving qualitative evaluation of fairness for LLMs especially for high-stakes tasks such as depression detection.
△ Less
Submitted 14 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Feature Aggregation with Latent Generative Replay for Federated Continual Learning of Socially Appropriate Robot Behaviours
Authors:
Nikhil Churamani,
Saksham Checker,
Fethiye Irmak Dogan,
Hao-Tien Lewis Chiang,
Hatice Gunes
Abstract:
It is critical for robots to explore Federated Learning (FL) settings where several robots, deployed in parallel, can learn independently while also sharing their learning with each other. This collaborative learning in real-world environments requires social robots to adapt dynamically to changing and unpredictable situations and varying task settings. Our work contributes to addressing these cha…
▽ More
It is critical for robots to explore Federated Learning (FL) settings where several robots, deployed in parallel, can learn independently while also sharing their learning with each other. This collaborative learning in real-world environments requires social robots to adapt dynamically to changing and unpredictable situations and varying task settings. Our work contributes to addressing these challenges by exploring a simulated living room environment where robots need to learn the social appropriateness of their actions. First, we propose Federated Root (FedRoot) averaging, a novel weight aggregation strategy which disentangles feature learning across clients from individual task-based learning. Second, to adapt to challenging environments, we extend FedRoot to Federated Latent Generative Replay (FedLGR), a novel Federated Continual Learning (FCL) strategy that uses FedRoot-based weight aggregation and embeds each client with a generator model for pseudo-rehearsal of learnt feature embeddings to mitigate forgetting in a resource-efficient manner. Our results show that FedRoot-based methods offer competitive performance while also resulting in a sizeable reduction in resource consumption (up to 86% for CPU usage and up to 72% for GPU usage). Additionally, our results demonstrate that FedRoot-based FCL methods outperform other methods while also offering an efficient solution (up to 84% CPU and 92% GPU usage reduction), with FedLGR providing the best results across evaluations.
△ Less
Submitted 21 February, 2025; v1 submitted 16 March, 2024;
originally announced May 2024.
-
A Longitudinal Study of Child Wellbeing Assessment via Online Interactions with a Social Robot
Authors:
Nida Itrat Abbasi,
Guy Laban,
Tamsin Ford,
Peter B. Jones,
Hatice Gunes
Abstract:
Socially Assistive Robots are studied in different Child-Robot Interaction settings. However, logistical constraints limit accessibility, particularly affecting timely support for mental wellbeing. In this work, we have investigated whether online interactions with a robot can be used for the assessment of mental wellbeing in children. The children (N=40, 20 girls and 20 boys; 8-13 years) interact…
▽ More
Socially Assistive Robots are studied in different Child-Robot Interaction settings. However, logistical constraints limit accessibility, particularly affecting timely support for mental wellbeing. In this work, we have investigated whether online interactions with a robot can be used for the assessment of mental wellbeing in children. The children (N=40, 20 girls and 20 boys; 8-13 years) interacted with the Nao robot (30-45 mins) over three sessions, at least a week apart. Audio-visual recordings were collected throughout the sessions that concluded with the children answering user perception questionnaires pertaining to their anxiety towards the robot, and the robot's abilities. We divided the participants into three wellbeing clusters (low, med and high tertiles) using their responses to the Short Moods and Feelings Questionnaire (SMFQ) and further analysed how their wellbeing and their perceptions of the robot changed over the wellbeing tertiles, across sessions and across participants' gender. Our primary findings suggest that (I) online mediated-interactions with robots can be effective in assessing children's mental wellbeing over time, and (II) children's overall perception of the robot either improved or remained consistent across time. Supplementary exploratory analyses have also revealed that the gender of the children affected their wellbeing assessments with interactions effectively distinguishing between varying levels of wellbeing for both boys and girls for the first session and only for boys during the second session. The analyses have also revealed that girls have a higher opinion of the robot as a confidante as compared with boys. Findings from this work affirm the potential of using online mediated interactions with robots for the assessment of the mental wellbeing of children.
△ Less
Submitted 14 February, 2025; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Federated Learning of Socially Appropriate Agent Behaviours in Simulated Home Environments
Authors:
Saksham Checker,
Nikhil Churamani,
Hatice Gunes
Abstract:
As social robots become increasingly integrated into daily life, ensuring their behaviours align with social norms is crucial. For their widespread open-world application, it is important to explore Federated Learning (FL) settings where individual robots can learn about their unique environments while also learning from each others' experiences. In this paper, we present a novel FL benchmark that…
▽ More
As social robots become increasingly integrated into daily life, ensuring their behaviours align with social norms is crucial. For their widespread open-world application, it is important to explore Federated Learning (FL) settings where individual robots can learn about their unique environments while also learning from each others' experiences. In this paper, we present a novel FL benchmark that evaluates different strategies, using multi-label regression objectives, where each client individually learns to predict the social appropriateness of different robot actions while also sharing their learning with others. Furthermore, splitting the training data by different contexts such that each client incrementally learns across contexts, we present a novel Federated Continual Learning (FCL) benchmark that adapts FL-based methods to use state-of-the-art Continual Learning (CL) methods to continually learn socially appropriate agent behaviours under different contextual settings. Federated Averaging (FedAvg) of weights emerges as a robust FL strategy while rehearsal-based FCL enables incrementally learning the social appropriateness of robot actions, across contextual splits.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Technology-assisted Journal Writing for Improving Student Mental Wellbeing: Humanoid Robot vs. Voice Assistant
Authors:
Batuhan Sayis,
Hatice Gunes
Abstract:
Conversational agents have a potential in improving student mental wellbeing while assisting them in self-disclosure activities such as journalling. Their embodiment might have an effect on what students disclose, and how they disclose this, and students overall adherence to the disclosure activity. However, the effect of embodiment in the context of agent assisted journal writing has not been stu…
▽ More
Conversational agents have a potential in improving student mental wellbeing while assisting them in self-disclosure activities such as journalling. Their embodiment might have an effect on what students disclose, and how they disclose this, and students overall adherence to the disclosure activity. However, the effect of embodiment in the context of agent assisted journal writing has not been studied. Therefore, this study aims to investigate the viability of using social robots (SR) and voice assistants (VA) for eliciting rich disclosures in journal writing that contributes to mental health status improvement in students over time. Forty two undergraduate and graduate students participated in the study that assessed the mood changes (via Brief Mood Introspection Scale, BMIS), level of subjective self-disclosure (via Subjective Self-Disclosure Questionnaire, SSDQ), and perceptions toward the agents (via Robot Social Attributes Scale, RoSAS) with and without agent (SR or VA) assisted journal writing. Results suggest that only in robot condition there are mood improvements, higher levels of disclosure, and positive perceptions over time in technology-assisted journal writing. Our results suggest that robot assisted journal writing has some advantages over voice assistant one for eliciting rich disclosures that contributes to mental health status improvement in students over time.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Robotising Psychometrics: Validating Wellbeing Assessment Tools in Child-Robot Interactions
Authors:
Nida Itrat Abbasi,
Guy Laban,
Tamsin Ford,
Peter B Jones,
Hatice Gunes
Abstract:
The interdisciplinary nature of Child-Robot Interaction (CRI) fosters incorporating measures and methodologies from many established domains. However, when employing CRI approaches to sensitive avenues of health and wellbeing, caution is critical in adapting metrics to retain their safety standards and ensure accurate utilisation. In this work, we conducted a secondary analysis to previous empiric…
▽ More
The interdisciplinary nature of Child-Robot Interaction (CRI) fosters incorporating measures and methodologies from many established domains. However, when employing CRI approaches to sensitive avenues of health and wellbeing, caution is critical in adapting metrics to retain their safety standards and ensure accurate utilisation. In this work, we conducted a secondary analysis to previous empirical work, investigating the reliability and construct validity of established psychological questionnaires such as the Short Moods and Feelings Questionnaire (SMFQ) and three subscales (generalised anxiety, panic and low mood) of the Revised Child Anxiety and Depression Scale (RCADS) within a CRI setting for the assessment of mental wellbeing. Through confirmatory principal component analysis, we have observed that these measures are reliable and valid in the context of CRI. Furthermore, our analysis revealed that scales communicated by a robot demonstrated a better fit than when self-reported, underscoring the efficiency and effectiveness of robot-mediated psychological assessments in these settings. Nevertheless, we have also observed variations in item contributions to the main factor, suggesting potential areas of examination and revision (e.g., relating to physiological changes, inactivity and cognitive demands) when used in CRI. Findings from this work highlight the importance of verifying the reliability and validity of standardised metrics and assessment tools when employed in CRI settings, thus, aiming to avoid any misinterpretations and misrepresentations.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Appropriateness of LLM-equipped Robotic Well-being Coach Language in the Workplace: A Qualitative Evaluation
Authors:
Micol Spitale,
Minja Axelsson,
Hatice Gunes
Abstract:
Robotic coaches have been recently investigated to promote mental well-being in various contexts such as workplaces and homes. With the widespread use of Large Language Models (LLMs), HRI researchers are called to consider language appropriateness when using such generated language for robotic mental well-being coaches in the real world. Therefore, this paper presents the first work that investiga…
▽ More
Robotic coaches have been recently investigated to promote mental well-being in various contexts such as workplaces and homes. With the widespread use of Large Language Models (LLMs), HRI researchers are called to consider language appropriateness when using such generated language for robotic mental well-being coaches in the real world. Therefore, this paper presents the first work that investigated the language appropriateness of robot mental well-being coach in the workplace. To this end, we conducted an empirical study that involved 17 employees who interacted over 4 weeks with a robotic mental well-being coach equipped with LLM-based capabilities. After the study, we individually interviewed them and we conducted a focus group of 1.5 hours with 11 of them. The focus group consisted of: i) an ice-breaking activity, ii) evaluation of robotic coach language appropriateness in various scenarios, and iii) listing shoulds and shouldn'ts for designing appropriate robotic coach language for mental well-being. From our qualitative evaluation, we found that a language-appropriate robotic coach should (1) ask deep questions which explore feelings of the coachees, rather than superficial questions, (2) express and show emotional and empathic understanding of the context, and (3) not make any assumptions without clarifying with follow-up questions to avoid bias and stereotyping. These results can inform the design of language-appropriate robotic coach to promote mental well-being in real-world contexts.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
REACT 2024: the Second Multiple Appropriate Facial Reaction Generation Challenge
Authors:
Siyang Song,
Micol Spitale,
Cheng Luo,
Cristina Palmero,
German Barquero,
Hengde Zhu,
Sergio Escalera,
Michel Valstar,
Tobias Baur,
Fabien Ringeval,
Elisabeth Andre,
Hatice Gunes
Abstract:
In dyadic interactions, humans communicate their intentions and state of mind using verbal and non-verbal cues, where multiple different facial reactions might be appropriate in response to a specific speaker behaviour. Then, how to develop a machine learning (ML) model that can automatically generate multiple appropriate, diverse, realistic and synchronised human facial reactions from an previous…
▽ More
In dyadic interactions, humans communicate their intentions and state of mind using verbal and non-verbal cues, where multiple different facial reactions might be appropriate in response to a specific speaker behaviour. Then, how to develop a machine learning (ML) model that can automatically generate multiple appropriate, diverse, realistic and synchronised human facial reactions from an previously unseen speaker behaviour is a challenging task. Following the successful organisation of the first REACT challenge (REACT 2023), this edition of the challenge (REACT 2024) employs a subset used by the previous challenge, which contains segmented 30-secs dyadic interaction clips originally recorded as part of the NOXI and RECOLA datasets, encouraging participants to develop and benchmark Machine Learning (ML) models that can generate multiple appropriate facial reactions (including facial image sequences and their attributes) given an input conversational partner's stimulus under various dyadic video conference scenarios. This paper presents: (i) the guidelines of the REACT 2024 challenge; (ii) the dataset utilized in the challenge; and (iii) the performance of the baseline systems on the two proposed sub-challenges: Offline Multiple Appropriate Facial Reaction Generation and Online Multiple Appropriate Facial Reaction Generation, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline_react2024.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
"Oh, Sorry, I Think I Interrupted You'': Designing Repair Strategies for Robotic Longitudinal Well-being Coaching
Authors:
Minja Axelsson,
Micol Spitale,
Hatice Gunes
Abstract:
Robotic well-being coaches have been shown to successfully promote people's mental well-being. To provide successful coaching, a robotic coach should have the capability to repair the mistakes it makes. Past investigations of robot mistakes are limited to game or task-based, one-off and in-lab studies. This paper presents a 4-phase design process to design repair strategies for robotic longitudina…
▽ More
Robotic well-being coaches have been shown to successfully promote people's mental well-being. To provide successful coaching, a robotic coach should have the capability to repair the mistakes it makes. Past investigations of robot mistakes are limited to game or task-based, one-off and in-lab studies. This paper presents a 4-phase design process to design repair strategies for robotic longitudinal well-being coaching with the involvement of real-world stakeholders: 1) designing repair strategies with a professional well-being coach; 2) a longitudinal study with the involvement of experienced users (i.e., who had already interacted with a robotic coach) to investigate the repair strategies defined in (1); 3) a design workshop with users from the study in (2) to gather their perspectives on the robotic coach's repair strategies; 4) discussing the results obtained in (2) and (3) with the mental well-being professional to reflect on how to design repair strategies for robotic coaching. Our results show that users have different expectations for a robotic coach than a human coach, which influences how repair strategies should be designed. We show that different repair strategies (e.g., apologizing, explaining, or repairing empathically) are appropriate in different scenarios, and that preferences for repair strategies change during longitudinal interactions with the robotic coach.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Uncertainty-based Fairness Measures
Authors:
Selim Kuzucu,
Jiaee Cheong,
Hatice Gunes,
Sinan Kalkan
Abstract:
Unfair predictions of machine learning (ML) models impede their broad acceptance in real-world settings. Tackling this arduous challenge first necessitates defining what it means for an ML model to be fair. This has been addressed by the ML community with various measures of fairness that depend on the prediction outcomes of the ML models, either at the group level or the individual level. These f…
▽ More
Unfair predictions of machine learning (ML) models impede their broad acceptance in real-world settings. Tackling this arduous challenge first necessitates defining what it means for an ML model to be fair. This has been addressed by the ML community with various measures of fairness that depend on the prediction outcomes of the ML models, either at the group level or the individual level. These fairness measures are limited in that they utilize point predictions, neglecting their variances, or uncertainties, making them susceptible to noise, missingness and shifts in data. In this paper, we first show that an ML model may appear to be fair with existing point-based fairness measures but biased against a demographic group in terms of prediction uncertainties. Then, we introduce new fairness measures based on different types of uncertainties, namely, aleatoric uncertainty and epistemic uncertainty. We demonstrate on many datasets that (i) our uncertainty-based measures are complementary to existing measures of fairness, and (ii) they provide more insights about the underlying issues leading to bias.
△ Less
Submitted 29 August, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
VITA: A Multi-modal LLM-based System for Longitudinal, Autonomous, and Adaptive Robotic Mental Well-being Coaching
Authors:
Micol Spitale,
Minja Axelsson,
Hatice Gunes
Abstract:
Recently, several works have explored if and how robotic coaches can promote and maintain mental well-being in different settings. However, findings from these studies revealed that these robotic coaches are not ready to be used and deployed in real-world settings due to several limitations that span from technological challenges to coaching success. To overcome these challenges, this paper presen…
▽ More
Recently, several works have explored if and how robotic coaches can promote and maintain mental well-being in different settings. However, findings from these studies revealed that these robotic coaches are not ready to be used and deployed in real-world settings due to several limitations that span from technological challenges to coaching success. To overcome these challenges, this paper presents VITA, a novel multi-modal LLM-based system that allows robotic coaches to autonomously adapt to the coachee's multi-modal behaviours (facial valence and speech duration) and deliver coaching exercises in order to promote mental well-being in adults. We identified five objectives that correspond to the challenges in the recent literature, and we show how the VITA system addresses these via experimental validations that include one in-lab pilot study (N=4) that enabled us to test different robotic coach configurations (pre-scripted, generic, and adaptive models) and inform its design for using it in the real world, and one real-world study (N=17) conducted in a workplace over 4 weeks. Our results show that: (i) coachees perceived the VITA adaptive and generic configurations more positively than the pre-scripted one, and they felt understood and heard by the adaptive robotic coach, (ii) the VITA adaptive robotic coach kept learning successfully by personalising to each coachee over time and did not detect any interaction ruptures during the coaching, (iii) coachees had significant mental well-being improvements via the VITA-based robotic coach practice. The code for the VITA system is openly available via: https://github.com/Cambridge-AFAR/VITA-system.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
A Systematic Review on Reproducibility in Child-Robot Interaction
Authors:
Micol Spitale,
Rebecca Stower,
Elmira Yadollahi,
Maria Teresa Parreira,
Nida Itrat Abbasi,
Iolanda Leite,
Hatice Gunes
Abstract:
Research reproducibility - i.e., rerunning analyses on original data to replicate the results - is paramount for guaranteeing scientific validity. However, reproducibility is often very challenging, especially in research fields where multi-disciplinary teams are involved, such as child-robot interaction (CRI). This paper presents a systematic review of the last three years (2020-2022) of research…
▽ More
Research reproducibility - i.e., rerunning analyses on original data to replicate the results - is paramount for guaranteeing scientific validity. However, reproducibility is often very challenging, especially in research fields where multi-disciplinary teams are involved, such as child-robot interaction (CRI). This paper presents a systematic review of the last three years (2020-2022) of research in CRI under the lens of reproducibility, by analysing the field for transparency in reporting. Across a total of 325 studies, we found deficiencies in reporting demographics (e.g. age of participants), study design and implementation (e.g. length of interactions), and open data (e.g. maintaining an active code repository). From this analysis, we distill a set of guidelines and provide a checklist to systematically report CRI studies to help and guide research to improve reproducibility in CRI and beyond.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
MRecGen: Multimodal Appropriate Reaction Generator
Authors:
Jiaqi Xu,
Cheng Luo,
Weicheng Xie,
Linlin Shen,
Xiaofeng Liu,
Lu Liu,
Hatice Gunes,
Siyang Song
Abstract:
Verbal and non-verbal human reaction generation is a challenging task, as different reactions could be appropriate for responding to the same behaviour. This paper proposes the first multiple and multimodal (verbal and nonverbal) appropriate human reaction generation framework that can generate appropriate and realistic human-style reactions (displayed in the form of synchronised text, audio and v…
▽ More
Verbal and non-verbal human reaction generation is a challenging task, as different reactions could be appropriate for responding to the same behaviour. This paper proposes the first multiple and multimodal (verbal and nonverbal) appropriate human reaction generation framework that can generate appropriate and realistic human-style reactions (displayed in the form of synchronised text, audio and video streams) in response to an input user behaviour. This novel technique can be applied to various human-computer interaction scenarios by generating appropriate virtual agent/robot behaviours. Our demo is available at \url{https://github.com/SSYSteve/MRecGen}.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
REACT2023: the first Multi-modal Multiple Appropriate Facial Reaction Generation Challenge
Authors:
Siyang Song,
Micol Spitale,
Cheng Luo,
German Barquero,
Cristina Palmero,
Sergio Escalera,
Michel Valstar,
Tobias Baur,
Fabien Ringeval,
Elisabeth Andre,
Hatice Gunes
Abstract:
The Multi-modal Multiple Appropriate Facial Reaction Generation Challenge (REACT2023) is the first competition event focused on evaluating multimedia processing and machine learning techniques for generating human-appropriate facial reactions in various dyadic interaction scenarios, with all participants competing strictly under the same conditions. The goal of the challenge is to provide the firs…
▽ More
The Multi-modal Multiple Appropriate Facial Reaction Generation Challenge (REACT2023) is the first competition event focused on evaluating multimedia processing and machine learning techniques for generating human-appropriate facial reactions in various dyadic interaction scenarios, with all participants competing strictly under the same conditions. The goal of the challenge is to provide the first benchmark test set for multi-modal information processing and to foster collaboration among the audio, visual, and audio-visual affective computing communities, to compare the relative merits of the approaches to automatic appropriate facial reaction generation under different spontaneous dyadic interaction conditions. This paper presents: (i) novelties, contributions and guidelines of the REACT2023 challenge; (ii) the dataset utilized in the challenge; and (iii) the performance of baseline systems on the two proposed sub-challenges: Offline Multiple Appropriate Facial Reaction Generation and Online Multiple Appropriate Facial Reaction Generation, respectively. The challenge baseline code is publicly available at \url{https://github.com/reactmultimodalchallenge/baseline_react2023}.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions
Authors:
Cheng Luo,
Siyang Song,
Weicheng Xie,
Micol Spitale,
Zongyuan Ge,
Linlin Shen,
Hatice Gunes
Abstract:
In dyadic interaction, predicting the listener's facial reactions is challenging as different reactions could be appropriate in response to the same speaker's behaviour. Previous approaches predominantly treated this task as an interpolation or fitting problem, emphasizing deterministic outcomes but ignoring the diversity and uncertainty of human facial reactions. Furthermore, these methods often…
▽ More
In dyadic interaction, predicting the listener's facial reactions is challenging as different reactions could be appropriate in response to the same speaker's behaviour. Previous approaches predominantly treated this task as an interpolation or fitting problem, emphasizing deterministic outcomes but ignoring the diversity and uncertainty of human facial reactions. Furthermore, these methods often failed to model short-range and long-range dependencies within the interaction context, leading to issues in the synchrony and appropriateness of the generated facial reactions. To address these limitations, this paper reformulates the task as an extrapolation or prediction problem, and proposes an novel framework (called ReactFace) to generate multiple different but appropriate facial reactions from a speaker behaviour rather than merely replicating the corresponding listener facial behaviours. Our ReactFace generates multiple different but appropriate photo-realistic human facial reactions by: (i) learning an appropriate facial reaction distribution representing multiple different but appropriate facial reactions; and (ii) synchronizing the generated facial reactions with the speaker verbal and non-verbal behaviours at each time stamp, resulting in realistic 2D facial reaction sequences. Experimental results demonstrate the effectiveness of our approach in generating multiple diverse, synchronized, and appropriate facial reactions from each speaker's behaviour. The quality of the generated facial reactions is intimately tied to the speaker's speech and facial expressions, achieved through our novel speaker-listener interaction modules. Our code is made publicly available at \url{https://github.com/lingjivoo/ReactFace}.
△ Less
Submitted 3 November, 2024; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Reversible Graph Neural Network-based Reaction Distribution Learning for Multiple Appropriate Facial Reactions Generation
Authors:
Tong Xu,
Micol Spitale,
Hao Tang,
Lu Liu,
Hatice Gunes,
Siyang Song
Abstract:
Generating facial reactions in a human-human dyadic interaction is complex and highly dependent on the context since more than one facial reactions can be appropriate for the speaker's behaviour. This has challenged existing machine learning (ML) methods, whose training strategies enforce models to reproduce a specific (not multiple) facial reaction from each input speaker behaviour. This paper pr…
▽ More
Generating facial reactions in a human-human dyadic interaction is complex and highly dependent on the context since more than one facial reactions can be appropriate for the speaker's behaviour. This has challenged existing machine learning (ML) methods, whose training strategies enforce models to reproduce a specific (not multiple) facial reaction from each input speaker behaviour. This paper proposes the first multiple appropriate facial reaction generation framework that re-formulates the one-to-many mapping facial reaction generation problem as a one-to-one mapping problem. This means that we approach this problem by considering the generation of a distribution of the listener's appropriate facial reactions instead of multiple different appropriate facial reactions, i.e., 'many' appropriate facial reaction labels are summarised as 'one' distribution label during training. Our model consists of a perceptual processor, a cognitive processor, and a motor processor. The motor processor is implemented with a novel Reversible Multi-dimensional Edge Graph Neural Network (REGNN). This allows us to obtain a distribution of appropriate real facial reactions during the training process, enabling the cognitive processor to be trained to predict the appropriate facial reaction distribution. At the inference stage, the REGNN decodes an appropriate facial reaction by using this distribution as input. Experimental results demonstrate that our approach outperforms existing models in generating more appropriate, realistic, and synchronized facial reactions. The improved performance is largely attributed to the proposed appropriate facial reaction distribution learning strategy and the use of a REGNN. The code is available at https://github.com/TongXu-05/REGNN-Multiple-Appropriate-Facial-Reaction-Generation.
△ Less
Submitted 16 November, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Continual Facial Expression Recognition: A Benchmark
Authors:
Nikhil Churamani,
Tolga Dimlioglu,
German I. Parisi,
Hatice Gunes
Abstract:
Understanding human affective behaviour, especially in the dynamics of real-world settings, requires Facial Expression Recognition (FER) models to continuously adapt to individual differences in user expression, contextual attributions, and the environment. Current (deep) Machine Learning (ML)-based FER approaches pre-trained in isolation on benchmark datasets fail to capture the nuances of real-w…
▽ More
Understanding human affective behaviour, especially in the dynamics of real-world settings, requires Facial Expression Recognition (FER) models to continuously adapt to individual differences in user expression, contextual attributions, and the environment. Current (deep) Machine Learning (ML)-based FER approaches pre-trained in isolation on benchmark datasets fail to capture the nuances of real-world interactions where data is available only incrementally, acquired by the agent or robot during interactions. New learning comes at the cost of previous knowledge, resulting in catastrophic forgetting. Lifelong or Continual Learning (CL), on the other hand, enables adaptability in agents by being sensitive to changing data distributions, integrating new information without interfering with previously learnt knowledge. Positing CL as an effective learning paradigm for FER, this work presents the Continual Facial Expression Recognition (ConFER) benchmark that evaluates popular CL techniques on FER tasks. It presents a comparative analysis of several CL-based approaches on popular FER datasets such as CK+, RAF-DB, and AffectNet and present strategies for a successful implementation of ConFER for Affective Computing (AC) research. CL techniques, under different learning settings, are shown to achieve state-of-the-art (SOTA) performance across several datasets, thus motivating a discussion on the benefits of applying CL principles towards human behaviour understanding, particularly from facial expressions, as well the challenges entailed.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Affective Robotics For Wellbeing: A Scoping Review
Authors:
Micol Spitale,
Hatice Gunes
Abstract:
Affective robotics research aims to better understand human social and emotional signals to improve human-robot interaction (HRI), and has been widely used during the last decade in multiple application fields. Past works have demonstrated, indeed, the potential of using affective robots (i.e., that can recognize, or interpret, or process, or simulate human affects) for healthcare applications, es…
▽ More
Affective robotics research aims to better understand human social and emotional signals to improve human-robot interaction (HRI), and has been widely used during the last decade in multiple application fields. Past works have demonstrated, indeed, the potential of using affective robots (i.e., that can recognize, or interpret, or process, or simulate human affects) for healthcare applications, especially wellbeing. This paper systematically review the last decade (January 2013 - May 2022) of HRI literature to identify the main features of affective robotics for wellbeing. Specifically, we focused on the types of wellbeing goals affective robots addressed, their platforms, their shapes, their affective capabilities, and their autonomy in the surveyed studies. Based on this analysis, we list a set of recommendations that emerged, and we also present a research agenda to provide future directions to researchers in the field of affective robotics for wellbeing.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Affective Computing for Human-Robot Interaction Research: Four Critical Lessons for the Hitchhiker
Authors:
Hatice Gunes,
Nikhil Churamani
Abstract:
Social Robotics and Human-Robot Interaction (HRI) research relies on different Affective Computing (AC) solutions for sensing, perceiving and understanding human affective behaviour during interactions. This may include utilising off-the-shelf affect perception models that are pre-trained on popular affect recognition benchmarks and directly applied to situated interactions. However, the condition…
▽ More
Social Robotics and Human-Robot Interaction (HRI) research relies on different Affective Computing (AC) solutions for sensing, perceiving and understanding human affective behaviour during interactions. This may include utilising off-the-shelf affect perception models that are pre-trained on popular affect recognition benchmarks and directly applied to situated interactions. However, the conditions in situated human-robot interactions differ significantly from the training data and settings of these models. Thus, there is a need to deepen our understanding of how AC solutions can be best leveraged, customised and applied for situated HRI. This paper, while critiquing the existing practices, presents four critical lessons to be noted by the hitchhiker when applying AC for HRI research. These lessons conclude that: (i) The six basic emotions categories are irrelevant in situated interactions, (ii) Affect recognition accuracy (%) improvements are unimportant, (iii) Affect recognition does not generalise across contexts, and (iv) Affect recognition alone is insufficient for adaptation and personalisation. By describing the background and the context for each lesson, and demonstrating how these lessons have been learnt, this paper aims to enable the hitchhiker to successfully and insightfully leverage AC solutions for advancing HRI research.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How?
Authors:
Siyang Song,
Micol Spitale,
Yiming Luo,
Batuhan Bal,
Hatice Gunes
Abstract:
According to the Stimulus Organism Response (SOR) theory, all human behavioral reactions are stimulated by context, where people will process the received stimulus and produce an appropriate reaction. This implies that in a specific context for a given input stimulus, a person can react differently according to their internal state and other contextual factors. Analogously, in dyadic interactions,…
▽ More
According to the Stimulus Organism Response (SOR) theory, all human behavioral reactions are stimulated by context, where people will process the received stimulus and produce an appropriate reaction. This implies that in a specific context for a given input stimulus, a person can react differently according to their internal state and other contextual factors. Analogously, in dyadic interactions, humans communicate using verbal and nonverbal cues, where a broad spectrum of listeners' non-verbal reactions might be appropriate for responding to a specific speaker behaviour. There already exists a body of work that investigated the problem of automatically generating an appropriate reaction for a given input. However, none attempted to automatically generate multiple appropriate reactions in the context of dyadic interactions and evaluate the appropriateness of those reactions using objective measures. This paper starts by defining the facial Multiple Appropriate Reaction Generation (fMARG) task for the first time in the literature and proposes a new set of objective evaluation metrics to evaluate the appropriateness of the generated reactions. The paper subsequently introduces a framework to predict, generate, and evaluate multiple appropriate facial reactions.
△ Less
Submitted 23 March, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
GRATIS: Deep Learning Graph Representation with Task-specific Topology and Multi-dimensional Edge Features
Authors:
Siyang Song,
Yuxin Song,
Cheng Luo,
Zhiyuan Song,
Selim Kuzucu,
Xi Jia,
Zhijiang Guo,
Weicheng Xie,
Linlin Shen,
Hatice Gunes
Abstract:
Graph is powerful for representing various types of real-world data. The topology (edges' presence) and edges' features of a graph decides the message passing mechanism among vertices within the graph. While most existing approaches only manually define a single-value edge to describe the connectivity or strength of association between a pair of vertices, task-specific and crucial relationship cue…
▽ More
Graph is powerful for representing various types of real-world data. The topology (edges' presence) and edges' features of a graph decides the message passing mechanism among vertices within the graph. While most existing approaches only manually define a single-value edge to describe the connectivity or strength of association between a pair of vertices, task-specific and crucial relationship cues may be disregarded by such manually defined topology and single-value edge features. In this paper, we propose the first general graph representation learning framework (called GRATIS) which can generate a strong graph representation with a task-specific topology and task-specific multi-dimensional edge features from any arbitrary input. To learn each edge's presence and multi-dimensional feature, our framework takes both of the corresponding vertices pair and their global contextual information into consideration, enabling the generated graph representation to have a globally optimal message passing mechanism for different down-stream tasks. The principled investigation results achieved for various graph analysis tasks on 11 graph and non-graph datasets show that our GRATIS can not only largely enhance pre-defined graphs but also learns a strong graph representation for non-graph data, with clear performance improvements on all tasks. In particular, the learned topology and multi-dimensional edge features provide complementary task-related cues for graph analysis tasks. Our framework is effective, robust and flexible, and is a plug-and-play module that can be combined with different backbones and Graph Neural Networks (GNNs) to generate a task-specific graph representation from various graph and non-graph data. Our code is made publicly available at https://github.com/SSYSteve/Learning-Graph-Representation-with-Task-specific-Topology-and-Multi-dimensional-Edge-Features.
△ Less
Submitted 19 November, 2022;
originally announced November 2022.
-
An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality Recognition
Authors:
Rongfan Liao,
Siyang Song,
Hatice Gunes
Abstract:
Personality determines a wide variety of human daily and working behaviours, and is crucial for understanding human internal and external states. In recent years, a large number of automatic personality computing approaches have been developed to predict either the apparent personality or self-reported personality of the subject based on non-verbal audio-visual behaviours. However, the majority of…
▽ More
Personality determines a wide variety of human daily and working behaviours, and is crucial for understanding human internal and external states. In recent years, a large number of automatic personality computing approaches have been developed to predict either the apparent personality or self-reported personality of the subject based on non-verbal audio-visual behaviours. However, the majority of them suffer from complex and dataset-specific pre-processing steps and model training tricks. In the absence of a standardized benchmark with consistent experimental settings, it is not only impossible to fairly compare the real performances of these personality computing models but also makes them difficult to be reproduced. In this paper, we present the first reproducible audio-visual benchmarking framework to provide a fair and consistent evaluation of eight existing personality computing models (e.g., audio, visual and audio-visual) and seven standard deep learning models on both self-reported and apparent personality recognition tasks. Building upon a set of benchmarked models, we also investigate the impact of two previously-used long-term modelling strategies for summarising short-term/frame-level predictions on personality computing results. The results conclude: (i) apparent personality traits, inferred from facial behaviours by most benchmarked deep learning models, show more reliability than self-reported ones; (ii) visual models frequently achieved superior performances than audio models on personality recognition; (iii) non-verbal behaviours contribute differently in predicting different personality traits; and (iv) our reproduced personality computing models generally achieved worse performances than their original reported results. Our benchmark is publicly available at \url{https://github.com/liaorongfan/DeepPersonality}.
△ Less
Submitted 5 February, 2024; v1 submitted 17 October, 2022;
originally announced October 2022.
-
Automatic Context-Driven Inference of Engagement in HMI: A Survey
Authors:
Hanan Salam,
Oya Celiktutan,
Hatice Gunes,
Mohamed Chetouani
Abstract:
An integral part of seamless human-human communication is engagement, the process by which two or more participants establish, maintain, and end their perceived connection. Therefore, to develop successful human-centered human-machine interaction applications, automatic engagement inference is one of the tasks required to achieve engaging interactions between humans and machines, and to make machi…
▽ More
An integral part of seamless human-human communication is engagement, the process by which two or more participants establish, maintain, and end their perceived connection. Therefore, to develop successful human-centered human-machine interaction applications, automatic engagement inference is one of the tasks required to achieve engaging interactions between humans and machines, and to make machines attuned to their users, hence enhancing user satisfaction and technology acceptance. Several factors contribute to engagement state inference, which include the interaction context and interactants' behaviours and identity. Indeed, engagement is a multi-faceted and multi-modal construct that requires high accuracy in the analysis and interpretation of contextual, verbal and non-verbal cues. Thus, the development of an automated and intelligent system that accomplishes this task has been proven to be challenging so far. This paper presents a comprehensive survey on previous work in engagement inference for human-machine interaction, entailing interdisciplinary definition, engagement components and factors, publicly available datasets, ground truth assessment, and most commonly used features and methods, serving as a guide for the development of future human-machine interaction interfaces with reliable context-aware engagement inference capability. An in-depth review across embodied and disembodied interaction modes, and an emphasis on the interaction context of which engagement perception modules are integrated sets apart the presented survey from existing surveys.
△ Less
Submitted 30 September, 2022;
originally announced September 2022.
-
Participant Perceptions of a Robotic Coach Conducting Positive Psychology Exercises: A Qualitative Analysis
Authors:
Minja Axelsson,
Nikhil Churamani,
Atahan Caldir,
Hatice Gunes
Abstract:
This paper presents a qualitative analysis of participants' perceptions of a robotic coach conducting Positive Psychology exercises, providing insights for the future design of robotic coaches. Participants (n = 20) took part in a single-session (avg. 31 +- 10 minutes) Human-Robot Interaction study in a laboratory setting. We created the design of the robotic coach, and its affective adaptation, b…
▽ More
This paper presents a qualitative analysis of participants' perceptions of a robotic coach conducting Positive Psychology exercises, providing insights for the future design of robotic coaches. Participants (n = 20) took part in a single-session (avg. 31 +- 10 minutes) Human-Robot Interaction study in a laboratory setting. We created the design of the robotic coach, and its affective adaptation, based on user-centred design research and collaboration with a professional coach. We transcribed post-study participant interviews and conducted a Thematic Analysis. We discuss the results of that analysis, presenting aspects participants found particularly helpful (e.g., the robot asked the correct questions and helped them think of new positive things in their life), and what should be improved (e.g., the robot's utterance content should be more responsive). We found that participants had no clear preference for affective adaptation or no affective adaptation, which may be due to both positive and negative user perceptions being heightened in the case of adaptation. Based on our qualitative analysis, we highlight insights for the future design of robotic coaches, and areas for future investigation (e.g., examining how participants with different personality traits, or participants experiencing isolation, could benefit from an interaction with a robotic coach).
△ Less
Submitted 17 February, 2025; v1 submitted 8 September, 2022;
originally announced September 2022.
-
Robots as Mental Well-being Coaches: Design and Ethical Recommendations
Authors:
Minja Axelsson,
Micol Spitale,
Hatice Gunes
Abstract:
The last decade has shown a growing interest in robots as well-being coaches. However, insightful guidelines for the design of robots as coaches to promote mental well-being have not yet been proposed. This paper details design and ethical recommendations based on a qualitative analysis drawing on a grounded theory approach, which was conducted with a three-step iterative design process which incl…
▽ More
The last decade has shown a growing interest in robots as well-being coaches. However, insightful guidelines for the design of robots as coaches to promote mental well-being have not yet been proposed. This paper details design and ethical recommendations based on a qualitative analysis drawing on a grounded theory approach, which was conducted with a three-step iterative design process which included user-centered design studies involving robotic well-being coaches, namely: (1) a user-centred design study conducted with 11 participants consisting of both prospective users who had participated in a Brief Solution-Focused Practice study with a human coach, as well as coaches of different disciplines, (2) semi-structured individual interview data gathered from 20 participants attending a Positive Psychology intervention study with the robotic well-being coach Pepper, and (3) a user-centred design study conducted with 3 participants of the Positive Psychology study as well as 2 relevant well-being coaches. After conducting a thematic analysis and a qualitative analysis, we collated the data gathered into convergent and divergent themes, and we distilled from those results a set of design guidelines and ethical considerations. Our findings can inform researchers and roboticists on the key aspects to take into account when designing robotic mental well-being coaches.
△ Less
Submitted 30 January, 2024; v1 submitted 31 August, 2022;
originally announced August 2022.