Skip to main content

Showing 1–7 of 7 results for author: Elbanna, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.05423  [pdf, other

    cs.SD cs.AI eess.AS

    Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments

    Authors: Sagarika Alavilli, Annesya Banerjee, Gasser Elbanna, Annika Magaro

    Abstract: Current state-of-the-art speech recognition models are trained to map acoustic signals into sub-lexical units. While these models demonstrate superior performance, they remain vulnerable to out-of-distribution conditions such as background noise and speech augmentations. In this work, we hypothesize that incorporating speaker representations during speech recognition can enhance model robustness t… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: Submitted to ICASSP 2025

  2. arXiv:2409.09511  [pdf, other

    cs.SD cs.AI eess.AS

    Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features

    Authors: Satvik Dixit, Daniel M. Low, Gasser Elbanna, Fabio Catania, Satrajit S. Ghosh

    Abstract: Pre-trained deep learning embeddings have consistently shown superior performance over handcrafted acoustic features in speech emotion recognition (SER). However, unlike acoustic features with clear physical meaning, these embeddings lack clear interpretability. Explaining these embeddings is crucial for building trust in healthcare and security applications and advancing the scientific understand… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

  3. arXiv:2406.10401  [pdf

    eess.AS cs.AI cs.SD

    Evaluating Speaker Identity Coding in Self-supervised Models and Humans

    Authors: Gasser Elbanna

    Abstract: Speaker identity plays a significant role in human communication and is being increasingly used in societal applications, many through advances in machine learning. Speaker identity perception is an essential cognitive phenomenon that can be broadly reduced to two main tasks: recognizing a voice or discriminating between voices. Several studies have attempted to identify acoustic correlates of ide… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Masters Thesis

  4. arXiv:2406.06341  [pdf, other

    cs.SD cs.AI eess.AS eess.SP

    Predicting Heart Activity from Speech using Data-driven and Knowledge-based features

    Authors: Gasser Elbanna, Zohreh Mostaani, Mathew Magimai. -Doss

    Abstract: Accurately predicting heart activity and other biological signals is crucial for diagnosis and monitoring. Given that speech is an outcome of multiple physiological systems, a significant body of work studied the acoustic correlates of heart activity. Recently, self-supervised models have excelled in speech-related tasks compared to traditional acoustic methods. However, the robustness of data-dri… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  5. arXiv:2211.06646  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Efficient Speech Quality Assessment using Self-supervised Framewise Embeddings

    Authors: Karl El Hajal, Zihan Wu, Neil Scheidwasser-Clow, Gasser Elbanna, Milos Cernak

    Abstract: Automatic speech quality assessment is essential for audio researchers, developers, speech and language pathologists, and system quality engineers. The current state-of-the-art systems are based on framewise speech features (hand-engineered or learnable) combined with time dependency modeling. This paper proposes an efficient system with results comparable to the best performing model in the Confe… ▽ More

    Submitted 12 November, 2022; originally announced November 2022.

  6. arXiv:2206.12038  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping

    Authors: Gasser Elbanna, Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Karl El Hajal, Milos Cernak

    Abstract: Methods for extracting audio and speech features have been studied since pioneering work on spectrum analysis decades ago. Recent efforts are guided by the ambition to develop general-purpose audio representations. For example, deep neural networks can extract optimal embeddings if they are trained on large audio datasets. This work extends existing methods based on self-supervised learning by boo… ▽ More

    Submitted 25 October, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

    Comments: Submitted to HEAR-PMLR 2021

  7. arXiv:2203.16637  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load

    Authors: Gasser Elbanna, Alice Biryukov, Neil Scheidwasser-Clow, Lara Orlandic, Pablo Mainar, Mikolaj Kegler, Pierre Beckmann, Milos Cernak

    Abstract: As a neurophysiological response to threat or adverse conditions, stress can affect cognition, emotion and behaviour with potentially detrimental effects on health in the case of sustained exposure. Since the affective content of speech is inherently modulated by an individual's physical and mental state, a substantial body of research has been devoted to the study of paralinguistic correlates of… ▽ More

    Submitted 30 June, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Submitted to InterSpeech 2022