Skip to main content

Showing 1–14 of 14 results for author: Praveen, R G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.12261  [pdf, other

    cs.CV cs.SD eess.AS

    United we stand, Divided we fall: Handling Weak Complementary Relationships for Audio-Visual Emotion Recognition in Valence-Arousal Space

    Authors: R. Gnana Praveen, Jahangir Alam, Eric Charton

    Abstract: Audio and visual modalities are two predominant contact-free channels in videos, which are often expected to carry a complementary relationship with each other. However, they may not always complement each other, resulting in poor audio-visual feature representations. In this paper, we introduce Gated Recursive Joint Cross Attention (GRJCA) using a gating mechanism that can adaptively choose the m… ▽ More

    Submitted 21 March, 2025; v1 submitted 15 March, 2025; originally announced March 2025.

    Comments: Achieved 2nd place in valence arousal challenge Submission to CVPR2025 Workshop

  2. arXiv:2403.19554  [pdf, other

    cs.CV

    Cross-Attention is Not Always Needed: Dynamic Cross-Attention for Audio-Visual Dimensional Emotion Recognition

    Authors: R. Gnana Praveen, Jahangir Alam

    Abstract: In video-based emotion recognition, audio and visual modalities are often expected to have a complementary relationship, which is widely explored using cross-attention. However, they may also exhibit weak complementary relationships, resulting in poor representations of audio-visual features, thus degrading the performance of the system. To address this issue, we propose Dynamic Cross-Attention (D… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted at IEEE ICME2024

  3. arXiv:2403.13659  [pdf, other

    cs.CV cs.SD eess.AS

    Recursive Joint Cross-Modal Attention for Multimodal Fusion in Dimensional Emotion Recognition

    Authors: R. Gnana Praveen, Jahangir Alam

    Abstract: Though multimodal emotion recognition has achieved significant progress over recent years, the potential of rich synergic relationships across the modalities is not fully exploited. In this paper, we introduce Recursive Joint Cross-Modal Attention (RJCMA) to effectively capture both intra- and inter-modal relationships across audio, visual, and text modalities for dimensional emotion recognition.… ▽ More

    Submitted 13 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  4. arXiv:2403.04661  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Dynamic Cross Attention for Audio-Visual Person Verification

    Authors: R. Gnana Praveen, Jahangir Alam

    Abstract: Although person or identity verification has been predominantly explored using individual modalities such as face and voice, audio-visual fusion has recently shown immense potential to outperform unimodal approaches. Audio and visual modalities are often expected to pose strong complementary relationships, which plays a crucial role in effective audio-visual fusion. However, they may not always st… ▽ More

    Submitted 22 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted to FG2024

  5. arXiv:2403.04654  [pdf, other

    cs.CV cs.SD eess.AS

    Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention

    Authors: R. Gnana Praveen, Jahangir Alam

    Abstract: Person or identity verification has been recently gaining a lot of attention using audio-visual fusion as faces and voices share close associations with each other. Conventional approaches based on audio-visual fusion rely on score-level or early feature-level fusion techniques. Though existing approaches showed improvement over unimodal systems, the potential of audio-visual fusion for person ver… ▽ More

    Submitted 26 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted to FG2024

  6. arXiv:2309.16569  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Audio-Visual Speaker Verification via Joint Cross-Attention

    Authors: R. Gnana Praveen, Jahangir Alam

    Abstract: Speaker verification has been widely explored using speech signals, which has shown significant improvement using deep models. Recently, there has been a surge in exploring faces and voices as they can offer more complementary and comprehensive information than relying only on a single modality of speech signals. Though current methods in the literature on the fusion of faces and voices have shown… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  7. arXiv:2304.07958  [pdf, other

    cs.CV cs.SD eess.AS

    Recursive Joint Attention for Audio-Visual Fusion in Regression based Emotion Recognition

    Authors: R Gnana Praveen, Eric Granger, Patrick Cardinal

    Abstract: In video-based emotion recognition (ER), it is important to effectively leverage the complementary relationship among audio (A) and visual (V) modalities, while retaining the intra-modal characteristics of individual modalities. In this paper, a recursive joint attention model is proposed along with long short-term memory (LSTM) modules for the fusion of vocal and facial expressions in regression-… ▽ More

    Submitted 16 April, 2023; originally announced April 2023.

  8. arXiv:2209.09068  [pdf, other

    cs.CV

    Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention

    Authors: R Gnana Praveen, Eric Granger, Patrick Cardinal

    Abstract: Automatic emotion recognition (ER) has recently gained lot of interest due to its potential in many real-world applications. In this context, multimodal approaches have been shown to improve performance (over unimodal approaches) by combining diverse and complementary sources of information, providing some robustness to noisy and missing modalities. In this paper, we focus on dimensional ER based… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2203.14779, arXiv:2111.05222

  9. arXiv:2203.14779  [pdf, other

    cs.CV cs.HC cs.SD eess.AS

    A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

    Authors: R. Gnana Praveen, Wheidima Carneiro de Melo, Nasib Ullah, Haseeb Aslam, Osama Zeeshan, Théo Denorme, Marco Pedersoli, Alessandro Koerich, Simon Bacon, Patrick Cardinal, Eric Granger

    Abstract: Multimodal emotion recognition has recently gained much attention since it can leverage diverse and complementary relationships over multiple modalities (e.g., audio, visual, biosignals, etc.), and can provide some robustness to noisy modalities. Most state-of-the-art methods for audio-visual (A-V) fusion rely on recurrent networks or conventional attention mechanisms that do not effectively lever… ▽ More

    Submitted 6 July, 2024; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: text overlap with arXiv:2111.05222

  10. arXiv:2111.05222  [pdf, other

    cs.CV cs.SD eess.AS eess.IV

    Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition

    Authors: R. Gnana Praveen, Eric Granger, Patrick Cardinal

    Abstract: Multimodal analysis has recently drawn much interest in affective computing, since it can improve the overall accuracy of emotion recognition over isolated uni-modal approaches. The most effective techniques for multimodal emotion recognition efficiently leverage diverse and complimentary sources of information, such as facial, vocal, and physiological modalities, to provide comprehensive feature… ▽ More

    Submitted 6 July, 2024; v1 submitted 9 November, 2021; originally announced November 2021.

    Comments: Accepted in FG2021

  11. arXiv:2104.06524  [pdf, other

    cs.CV cs.LG

    Holistic Guidance for Occluded Person Re-Identification

    Authors: Madhu Kiran, R Gnana Praveen, Le Thanh Nguyen-Meidine, Soufiane Belharbi, Louis-Antoine Blais-Morin, Eric Granger

    Abstract: In real-world video surveillance applications, person re-identification (ReID) suffers from the effects of occlusions and detection errors. Despite recent advances, occlusions continue to corrupt the features extracted by state-of-art CNN backbones, and thereby deteriorate the accuracy of ReID systems. To address this issue, methods in the literature use an additional costly process such as pose e… ▽ More

    Submitted 22 July, 2023; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: British Machine Vision Conference (BMVC) 2021

  12. arXiv:2101.09858  [pdf, other

    cs.CV

    Weakly Supervised Learning for Facial Behavior Analysis : A Review

    Authors: R. Gnana Praveen, Patrick Cardinal, Eric Granger

    Abstract: In the recent years, there has been a shift in facial behavior analysis from the laboratory-controlled conditions to the challenging in-the-wild conditions due to the superior performance of deep learning based approaches for many real world applications.However, the performance of deep learning approaches relies on the amount of training data. One of the major problems with data acquisition is th… ▽ More

    Submitted 14 October, 2024; v1 submitted 24 January, 2021; originally announced January 2021.

    Comments: Provided a link of constantly updated papers \url{https://github.com/praveena2j/ awesome-Weakly-Supervised-Facial-Behavior-Analysis}

  13. arXiv:2008.06392  [pdf, other

    cs.CV

    Deep Domain Adaptation for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labelled Videos

    Authors: R. Gnana Praveen, Eric Granger, Patrick Cardinal

    Abstract: Estimation of pain intensity from facial expressions captured in videos has an immense potential for health care applications. Given the challenges related to subjective variations of facial expressions, and operational capture conditions, the accuracy of state-of-the-art DL models for recognizing facial expressions may decline. Domain adaptation has been widely explored to alleviate the problem o… ▽ More

    Submitted 6 July, 2024; v1 submitted 13 August, 2020; originally announced August 2020.

    Comments: Under review for a journal. arXiv admin note: text overlap with arXiv:1910.08173

  14. arXiv:1910.08173  [pdf, other

    cs.CV

    Deep Weakly-Supervised Domain Adaptation for Pain Localization in Videos

    Authors: R. Gnana Praveen, Eric Granger, Patrick Cardinal

    Abstract: Automatic pain assessment has an important potential diagnostic value for populations that are incapable of articulating their pain experiences. As one of the dominating nonverbal channels for eliciting pain expression events, facial expressions has been widely investigated for estimating the pain intensity of individual. However, using state-of-the-art deep learning (DL) models in real-world pain… ▽ More

    Submitted 6 July, 2024; v1 submitted 17 October, 2019; originally announced October 2019.

    Comments: Accepted in FG 2020