Search | arXiv e-print repository

On the Validity of Head Motion Patterns as Generalisable Depression Biomarkers

Authors: Monika Gahalawat, Maneesh Bilalpur, Raul Fernandez Rojas, Jeffrey F. Cohn, Roland Goecke, Ramanathan Subramanian

Abstract: Depression is a debilitating mood disorder negatively impacting millions worldwide. While researchers have explored multiple verbal and non-verbal behavioural cues for automated depression assessment, head motion has received little attention thus far. Further, the common practice of validating machine learning models via a single dataset can limit model generalisability. This work examines the ef… ▽ More Depression is a debilitating mood disorder negatively impacting millions worldwide. While researchers have explored multiple verbal and non-verbal behavioural cues for automated depression assessment, head motion has received little attention thus far. Further, the common practice of validating machine learning models via a single dataset can limit model generalisability. This work examines the effectiveness and generalisability of models utilising elementary head motion units, termed kinemes, for depression severity estimation. Specifically, we consider three depression datasets from different western cultures (German: AVEC2013, Australian: Blackdog and American: Pitt datasets) with varied contextual and recording settings to investigate the generalisability of the derived kineme patterns via two methods: (i) k-fold cross-validation over individual/multiple datasets, and (ii) model reuse on other datasets. Evaluating classification and regression performance with classical machine learning methods, our results show that: (1) head motion patterns are efficient biomarkers for estimating depression severity, achieving highly competitive performance for both classification and regression tasks on a variety of datasets, including achieving the second best Mean Absolute Error (MAE) on the AVEC2013 dataset, and (2) kineme-based features are more generalisable than (a) raw head motion descriptors for binary severity classification, and (b) other visual behavioural cues for severity estimation (regression). △ Less

Submitted 29 May, 2025; originally announced May 2025.

arXiv:2402.08837 [pdf, other]

Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues

Authors: Maneesh Bilalpur, Mert Inan, Dorsa Zeinali, Jeffrey F. Cohn, Malihe Alikhani

Abstract: Addressing the critical shortage of mental health resources for effective screening, diagnosis, and treatment remains a significant challenge. This scarcity underscores the need for innovative solutions, particularly in enhancing the accessibility and efficacy of therapeutic support. Embodied agents with advanced interactive capabilities emerge as a promising and cost-effective supplement to tradi… ▽ More Addressing the critical shortage of mental health resources for effective screening, diagnosis, and treatment remains a significant challenge. This scarcity underscores the need for innovative solutions, particularly in enhancing the accessibility and efficacy of therapeutic support. Embodied agents with advanced interactive capabilities emerge as a promising and cost-effective supplement to traditional caregiving methods. Crucial to these agents' effectiveness is their ability to simulate non-verbal behaviors, like backchannels, that are pivotal in establishing rapport and understanding in therapeutic contexts but remain under-explored. To improve the rapport-building capabilities of embodied agents we annotated backchannel smiles in videos of intimate face-to-face conversations over topics such as mental health, illness, and relationships. We hypothesized that both speaker and listener behaviors affect the duration and intensity of backchannel smiles. Using cues from speech prosody and language along with the demographics of the speaker and listener, we found them to contain significant predictors of the intensity of backchannel smiles. Based on our findings, we introduce backchannel smile production in embodied agents as a generation problem. Our attention-based generative model suggests that listener information offers performance improvements over the baseline speaker-centric generation approach. Conditioned generation using the significant predictors of smile intensity provides statistically significant improvements in empirical measures of generation quality. Our user study by transferring generated smiles to an embodied agent suggests that agent with backchannel smiles is perceived to be more human-like and is an attractive alternative for non-personal conversations over agent without backchannel smiles. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Accepted to the Machine Learning for Cognitive and Mental Health Workshop at AAAI 2024

arXiv:2401.08164 [pdf, other]

EEG-based Cognitive Load Estimation of Acoustic Parameters for Data Sonification

Authors: Gulshan Sharma, Surbhi Madan, Maneesh Bilalpur, Abhinav Dhall, Ramanathan Subramanian

Abstract: Sonification is a data visualization technique which expresses data attributes via psychoacoustic parameters, which are non-speech audio signals used to convey information. This paper investigates the binary estimation of cognitive load induced by psychoacoustic parameters conveying the focus level of an astronomical image via Electroencephalogram (EEG) embeddings. Employing machine learning and d… ▽ More Sonification is a data visualization technique which expresses data attributes via psychoacoustic parameters, which are non-speech audio signals used to convey information. This paper investigates the binary estimation of cognitive load induced by psychoacoustic parameters conveying the focus level of an astronomical image via Electroencephalogram (EEG) embeddings. Employing machine learning and deep learning methodologies, we demonstrate that EEG signals are reliable for (a) binary estimation of cognitive load, (b) isolating easy vs difficult visual-to-auditory perceptual mappings, and (c) capturing perceptual similarities among psychoacoustic parameters. Our key findings reveal that (1) EEG embeddings can reliably measure cognitive load, achieving a peak F1-score of 0.98; (2) Extreme focus levels are easier to detect via auditory mappings than intermediate ones, and (3) psychoacoustic parameters inducing comparable cognitive load levels tend to generate similar EEG encodings. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2310.07093 [pdf, other]

Argumentative Stance Prediction: An Exploratory Study on Multimodality and Few-Shot Learning

Authors: Arushi Sharma, Abhibha Gupta, Maneesh Bilalpur

Abstract: To advance argumentative stance prediction as a multimodal problem, the First Shared Task in Multimodal Argument Mining hosted stance prediction in crucial social topics of gun control and abortion. Our exploratory study attempts to evaluate the necessity of images for stance prediction in tweets and compare out-of-the-box text-based large-language models (LLM) in few-shot settings against fine-tu… ▽ More To advance argumentative stance prediction as a multimodal problem, the First Shared Task in Multimodal Argument Mining hosted stance prediction in crucial social topics of gun control and abortion. Our exploratory study attempts to evaluate the necessity of images for stance prediction in tweets and compare out-of-the-box text-based large-language models (LLM) in few-shot settings against fine-tuned unimodal and multimodal models. Our work suggests an ensemble of fine-tuned text-based language models (0.817 F1-score) outperforms both the multimodal (0.677 F1-score) and text-based few-shot prediction using a recent state-of-the-art LLM (0.550 F1-score). In addition to the differences in performance, our findings suggest that the multimodal models tend to perform better when image content is summarized as natural language over their native pixel structure and, using in-context examples improves few-shot performance of LLMs. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2006.13386 [pdf, other]

Gender and Emotion Recognition from Implicit User Behavior Signals

Authors: Maneesh Bilalpur, Seyed Mostafa Kia, Mohan Kankanhalli, Ramanathan Subramanian

Abstract: This work explores the utility of implicit behavioral cues, namely, Electroencephalogram (EEG) signals and eye movements for gender recognition (GR) and emotion recognition (ER) from psychophysical behavior. Specifically, the examined cues are acquired via low-cost, off-the-shelf sensors. 28 users (14 male) recognized emotions from unoccluded (no mask) and partially occluded (eye or mouth masked)… ▽ More This work explores the utility of implicit behavioral cues, namely, Electroencephalogram (EEG) signals and eye movements for gender recognition (GR) and emotion recognition (ER) from psychophysical behavior. Specifically, the examined cues are acquired via low-cost, off-the-shelf sensors. 28 users (14 male) recognized emotions from unoccluded (no mask) and partially occluded (eye or mouth masked) emotive faces; their EEG responses contained gender-specific differences, while their eye movements were characteristic of the perceived facial emotions. Experimental results reveal that (a) reliable GR and ER is achievable with EEG and eye features, (b) differential cognitive processing of negative emotions is observed for females and (c) eye gaze-based gender differences manifest under partial face occlusion, as typified by the eye and mouth mask conditions. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Comments: Under consideration for publication in IEEE Trans. Affective Computing

arXiv:1809.04507 [pdf, other]

Investigating the generalizability of EEG-based Cognitive Load Estimation Across Visualizations

Authors: Viral Parekh, Maneesh Bilalpur, Sharavan Kumar, Stefan Winkler, C V Jawahar, Ramanathan Subramanian

Abstract: We examine if EEG-based cognitive load (CL) estimation is generalizable across the character, spatial pattern, bar graph and pie chart-based visualizations for the nback~task. CL is estimated via two recent approaches: (a) Deep convolutional neural network, and (b) Proximal support vector machines. Experiments reveal that CL estimation suffers across visualizations motivating the need for effectiv… ▽ More We examine if EEG-based cognitive load (CL) estimation is generalizable across the character, spatial pattern, bar graph and pie chart-based visualizations for the nback~task. CL is estimated via two recent approaches: (a) Deep convolutional neural network, and (b) Proximal support vector machines. Experiments reveal that CL estimation suffers across visualizations motivating the need for effective machine learning techniques to benchmark visual interface usability for a given analytic task. △ Less

Submitted 12 September, 2018; originally announced September 2018.

arXiv:1808.06055 [pdf, other]

doi 10.1145/3242969.3243016

EEG-based Evaluation of Cognitive Workload Induced by Acoustic Parameters for Data Sonification

Authors: Maneesh Bilalpur, Mohan Kankanhalli, Stefan Winkler, Ramanathan Subramanian

Abstract: Data Visualization has been receiving growing attention recently, with ubiquitous smart devices designed to render information in a variety of ways. However, while evaluations of visual tools for their interpretability and intuitiveness have been commonplace, not much research has been devoted to other forms of data rendering, eg, sonification. This work is the first to automatically estimate the… ▽ More Data Visualization has been receiving growing attention recently, with ubiquitous smart devices designed to render information in a variety of ways. However, while evaluations of visual tools for their interpretability and intuitiveness have been commonplace, not much research has been devoted to other forms of data rendering, eg, sonification. This work is the first to automatically estimate the cognitive load induced by different acoustic parameters considered for sonification in prior studies. We examine cognitive load via (a) perceptual data-sound mapping accuracies of users for the different acoustic parameters, (b) cognitive workload impressions explicitly reported by users, and (c) their implicit EEG responses compiled during the mapping task. Our main findings are that (i) low cognitive load-inducing (ie, more intuitive) acoustic parameters correspond to higher mapping accuracies, (ii) EEG spectral power analysis reveals higher $α$ band power for low cognitive load parameters, implying a congruent relationship between explicit and implicit user responses, and (iii) Cognitive load classification with EEG features achieves a peak F1-score of 0.64, confirming that reliable workload estimation is achievable with user EEG data compiled using wearable sensors. △ Less

Submitted 18 August, 2018; originally announced August 2018.

Comments: Accepted for publication in the proceedings of the 20th ACM International Conference on Multimodal Interaction, Colorado, USA

arXiv:1708.08735 [pdf, other]

Gender and Emotion Recognition with Implicit User Signals

Authors: Maneesh Bilalpur, Seyed Mostafa Kia, Manisha Chawla, Tat-Seng Chua, Ramanathan Subramanian

Abstract: We examine the utility of implicit user behavioral signals captured using low-cost, off-the-shelf devices for anonymous gender and emotion recognition. A user study designed to examine male and female sensitivity to facial emotions confirms that females recognize (especially negative) emotions quicker and more accurately than men, mirroring prior findings. Implicit viewer responses in the form of… ▽ More We examine the utility of implicit user behavioral signals captured using low-cost, off-the-shelf devices for anonymous gender and emotion recognition. A user study designed to examine male and female sensitivity to facial emotions confirms that females recognize (especially negative) emotions quicker and more accurately than men, mirroring prior findings. Implicit viewer responses in the form of EEG brain signals and eye movements are then examined for existence of (a) emotion and gender-specific patterns from event-related potentials (ERPs) and fixation distributions and (b) emotion and gender discriminability. Experiments reveal that (i) Gender and emotion-specific differences are observable from ERPs, (ii) multiple similarities exist between explicit responses gathered from users and their implicit behavioral signals, and (iii) Significantly above-chance ($\approx$70%) gender recognition is achievable on comparing emotion-specific EEG responses-- gender differences are encoded best for anger and disgust. Also, fairly modest valence (positive vs negative emotion) recognition is achieved with EEG and eye-based features. △ Less

Submitted 29 August, 2017; originally announced August 2017.

Comments: To be published in the Proceedings of 19th International Conference on Multimodal Interaction.2017

ACM Class: H.5.2; I.3.6; H.1.2

arXiv:1708.08729 [pdf, other]

Discovering Gender Differences in Facial Emotion Recognition via Implicit Behavioral Cues

Authors: Maneesh Bilalpur, Seyed Mostafa Kia, Tat-Seng Chua, Ramanathan Subramanian

Abstract: We examine the utility of implicit behavioral cues in the form of EEG brain signals and eye movements for gender recognition (GR) and emotion recognition (ER). Specifically, the examined cues are acquired via low-cost, off-the-shelf sensors. We asked 28 viewers (14 female) to recognize emotions from unoccluded (no mask) as well as partially occluded (eye and mouth masked) emotive faces. Obtained e… ▽ More We examine the utility of implicit behavioral cues in the form of EEG brain signals and eye movements for gender recognition (GR) and emotion recognition (ER). Specifically, the examined cues are acquired via low-cost, off-the-shelf sensors. We asked 28 viewers (14 female) to recognize emotions from unoccluded (no mask) as well as partially occluded (eye and mouth masked) emotive faces. Obtained experimental results reveal that (a) reliable GR and ER is achievable with EEG and eye features, (b) differential cognitive processing especially for negative emotions is observed for males and females and (c) some of these cognitive differences manifest under partial face occlusion, as typified by the eye and mouth mask conditions. △ Less

Submitted 29 August, 2017; originally announced August 2017.

Comments: To be published in the Proceedings of Seventh International Conference on Affective Computing and Intelligent Interaction.2017

ACM Class: H.5.2; I.3.6; H.1.2

Showing 1–9 of 9 results for author: Bilalpur, M