Skip to main content

Showing 1–13 of 13 results for author: Kheir, Y E

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.07722  [pdf

    cs.SD eess.AS

    Towards a Unified Benchmark for Arabic Pronunciation Assessment: Quranic Recitation as Case Study

    Authors: Yassine El Kheir, Omnia Ibrahim, Amit Meghanani, Nada Almarwani, Hawau Olamide Toyin, Sadeen Alharbi, Modar Alfadly, Lamya Alkanhal, Ibrahim Selim, Shehab Elbatal, Salima Mdhaffar, Thomas Hain, Yasser Hifny, Mostafa Shahin, Ahmed Ali

    Abstract: We present a unified benchmark for mispronunciation detection in Modern Standard Arabic (MSA) using Qur'anic recitation as a case study. Our approach lays the groundwork for advancing Arabic pronunciation assessment by providing a comprehensive pipeline that spans data processing, the development of a specialized phoneme set tailored to the nuances of MSA pronunciation, and the creation of the fir… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Accepted Interspeech 2025 and ArabicNLP Shared Task 2025

  2. arXiv:2505.13930  [pdf, ps, other

    cs.SD eess.AS

    BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention

    Authors: Yassine El Kheir, Tim Polzehl, Sebastian Möller

    Abstract: We propose BiCrossMamba-ST, a robust framework for speech deepfake detection that leverages a dual-branch spectro-temporal architecture powered by bidirectional Mamba blocks and mutual cross-attention. By processing spectral sub-bands and temporal intervals separately and then integrating their representations, BiCrossMamba-ST effectively captures the subtle cues of synthetic speech. In addition,… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted Interspeech 2025

  3. arXiv:2502.03559  [pdf, other

    eess.AS cs.SD

    Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection

    Authors: Yassine El Kheir, Youness Samih, Suraj Maharjan, Tim Polzehl, Sebastian Möller

    Abstract: This paper conducts a comprehensive layer-wise analysis of self-supervised learning (SSL) models for audio deepfake detection across diverse contexts, including multilingual datasets (English, Chinese, Spanish), partial, song, and scene-based deepfake scenarios. By systematically evaluating the contributions of different transformer layers, we uncover critical insights into model behavior and perf… ▽ More

    Submitted 7 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: Accepted to NAACL Findings 2025

  4. arXiv:2411.13424  [pdf

    cs.SD cs.CL eess.AS

    CAFE A Novel Code switching Dataset for Algerian Dialect French and English

    Authors: Houssam Eddine-Othman Lachemat, Akli Abbas, Nourredine Oukas, Yassine El Kheir, Samia Haboussi, Absar Chowdhury Shammur

    Abstract: The paper introduces and publicly releases (Data download link available after acceptance) CAFE -- the first Code-switching dataset between Algerian dialect, French, and english languages. The CAFE speech data is unique for (a) its spontaneous speaking style in vivo human-human conversation capturing phenomena like code-switching and overlapping speech, (b) addresses distinct linguistic challenges… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

    Comments: 24 pages, submitted to tallip

  5. arXiv:2408.02430  [pdf, other

    eess.AS

    Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic

    Authors: Yassine El Kheir, Hamdy Mubarak, Ahmed Ali, Shammur Absar Chowdhury

    Abstract: This paper presents a novel Dialectal Sound and Vowelization Recovery framework, designed to recognize borrowed and dialectal sounds within phonologically diverse and dialect-rich languages, that extends beyond its standard orthographic sound sets. The proposed framework utilized a quantized sequence of input with(out) continuous pretrained self-supervised representation. We show the efficacy of t… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted ACL 2024 Main Conference

  6. arXiv:2406.16099  [pdf, other

    cs.SD eess.AS

    Speech Representation Analysis based on Inter- and Intra-Model Similarities

    Authors: Yassine El Kheir, Ahmed Ali, Shammur Absar Chowdhury

    Abstract: Self-supervised models have revolutionized speech processing, achieving new levels of performance in a wide variety of tasks with limited resources. However, the inner workings of these models are still opaque. In this paper, we aim to analyze the encoded contextual representation of these foundation models based on their inter- and intra-model similarity, independent of any external annotation an… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 5 pages, Accepted to appear in ICASSP XAI-SA Workshop

  7. arXiv:2310.13974  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Pronunciation Assessment -- A Review

    Authors: Yassine El Kheir, Ahmed Ali, Shammur Absar Chowdhury

    Abstract: Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challeng… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: 9 pages, accepted to EMNLP Findings

  8. arXiv:2309.07739  [pdf, other

    cs.CL cs.SD eess.AS

    The complementary roles of non-verbal cues for Robust Pronunciation Assessment

    Authors: Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: Research on pronunciation assessment systems focuses on utilizing phonetic and phonological aspects of non-native (L2) speech, often neglecting the rich layer of information hidden within the non-verbal cues. In this study, we proposed a novel pronunciation assessment framework, IntraVerbalPA. % The framework innovatively incorporates both fine-grained frame- and abstract utterance-level non-verba… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 pages, submitted to ICASSP 2024

  9. arXiv:2309.07719  [pdf, other

    cs.CL cs.SD eess.AS

    L1-aware Multilingual Mispronunciation Detection Framework

    Authors: Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: The phonological discrepancies between a speaker's native (L1) and the non-native language (L2) serves as a major factor for mispronunciation. This paper introduces a novel multilingual MDD architecture, L1-MultiMDD, enriched with L1-aware speech representation. An end-to-end speech encoder is trained on the input signal and its corresponding reference phoneme sequence. First, an attention mechani… ▽ More

    Submitted 21 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 papers, submitted to ICASSP 2024

  10. arXiv:2308.02503  [pdf, other

    eess.AS cs.CL cs.SD

    MyVoice: Arabic Speech Resource Collaboration Platform

    Authors: Yousseif Elshahawy, Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances. Users can switch roles between contributors and… ▽ More

    Submitted 23 July, 2023; originally announced August 2023.

    Comments: 2 pages, accepted at InterSpeech23 Show and Tell Session

  11. arXiv:2306.01845  [pdf, other

    cs.SD eess.AS

    Multi-View Multi-Task Representation Learning for Mispronunciation Detection

    Authors: Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: The disparity in phonology between learner's native (L1) and target (L2) language poses a significant challenge for mispronunciation detection and diagnosis (MDD) systems. This challenge is further intensified by lack of annotated L2 data. This paper proposes a novel MDD architecture that exploits multiple `views' of the same input data assisted by auxiliary tasks to learn more distinctive phoneti… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: 5 pages, Accepted SLaTE23

  12. arXiv:2305.07445  [pdf, other

    eess.AS cs.CL cs.SD

    QVoice: Arabic Speech Pronunciation Learning Application

    Authors: Yassine El Kheir, Fouad Khnaisser, Shammur Absar Chowdhury, Hamdy Mubarak, Shazia Afzal, Ahmed Ali

    Abstract: This paper introduces a novel Arabic pronunciation learning application QVoice, powered with end-to-end mispronunciation detection and feedback generator module. The application is designed to support non-native Arabic speakers in enhancing their pronunciation skills, while also helping native speakers mitigate any potential influence from regional dialects on their Modern Standard Arabic (MSA) pr… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: 2 pages, Accepted InterSpeech23 Show & Tell Demo Session

    Journal ref: InterSpeech 2023

  13. arXiv:2211.00923  [pdf, other

    cs.SD cs.CL eess.AS

    SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation

    Authors: Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali, Hamdy Mubarak, Shazia Afzal

    Abstract: The lack of labeled second language (L2) speech data is a major challenge in designing mispronunciation detection models. We introduce SpeechBlender - a fine-grained data augmentation pipeline for generating mispronunciation errors to overcome such data scarcity. The SpeechBlender utilizes varieties of masks to target different regions of phonetic units, and use the mixing factors to linearly inte… ▽ More

    Submitted 12 July, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: 5 pages