Skip to main content

Showing 1–6 of 6 results for author: Farhadipour, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.16969  [pdf, ps, other

    eess.AS cs.SD

    State-Space Models in Efficient Whispered and Multi-dialect Speech Recognition

    Authors: Aref Farhadipour, Homayoon Beigi, Volker Dellwo, Hadi Veisi

    Abstract: Whispered speech recognition presents significant challenges for conventional automatic speech recognition systems, particularly when combined with dialect variation. However, utilizing an efficient method to solve this problem using a low-range dataset and processing load is beneficial. This paper proposes a solution using a Mamba-based state-space model and four fine-tuned self-supervised models… ▽ More

    Submitted 27 June, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

    Comments: paper is in 4+1 pages

  2. arXiv:2506.00733  [pdf, ps, other

    eess.AS cs.SD

    Quantifying and Reducing Speaker Heterogeneity within the Common Voice Corpus for Phonetic Analysis

    Authors: Miao Zhang, Aref Farhadipour, Annie Baker, Jiachen Ma, Bogdan Pricop, Eleanor Chodroff

    Abstract: With its crosslinguistic and cross-speaker diversity, the Mozilla Common Voice Corpus (CV) has been a valuable resource for multilingual speech technology and holds tremendous potential for research in crosslinguistic phonetics and speech sciences. Properly accounting for speaker variation is, however, key to the theoretical and statistical bases of speech research. While CV provides a client ID a… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted for Interspeech 2025

  3. arXiv:2503.06805  [pdf, other

    cs.CV cs.SD eess.AS

    Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts

    Authors: Aref Farhadipour, Hossein Ranjbar, Masoumeh Chapariniya, Teodora Vukovic, Sarah Ebling, Volker Dellwo

    Abstract: Emotion recognition and sentiment analysis are pivotal tasks in speech and language processing, particularly in real-world scenarios involving multi-party, conversational data. This paper presents a multimodal approach to tackle these challenges on a well-known dataset. We propose a system that integrates four key modalities/channels using pre-trained models: RoBERTa for text, Wav2Vec2 for speech,… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 5 pages

  4. arXiv:2409.00562  [pdf, ps, other

    eess.AS cs.CV cs.MM cs.SD

    Comparative Analysis of Modality Fusion Approaches for Audio-Visual Person Identification and Verification

    Authors: Aref Farhadipour, Masoumeh Chapariniya, Teodora Vukovic, Volker Dellwo

    Abstract: Multimodal learning involves integrating information from various modalities to enhance learning and comprehension. We compare three modality fusion strategies in person identification and verification by processing two modalities: voice and face. In this paper, a one-dimensional convolutional neural network is employed for x-vector extraction from voice, while the pre-trained VGGFace2 network and… ▽ More

    Submitted 2 November, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: This paper was accepted at the ICNLSP2024 conference

  5. arXiv:2407.21211  [pdf, ps, other

    eess.AS cs.SD

    Leveraging Self-Supervised Models for Automatic Whispered Speech Recognition

    Authors: Aref Farhadipour, Homa Asadi, Volker Dellwo

    Abstract: In automatic speech recognition, any factor that alters the acoustic properties of speech can pose a challenge to the system's performance. This paper presents a novel approach for automatic whispered speech recognition in the Irish dialect using the self-supervised WavLM model. Conventional automatic speech recognition systems often fail to accurately recognise whispered speech due to its distinc… ▽ More

    Submitted 3 November, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: This paper was accepted at the ICCKE2024 conference

  6. arXiv:2307.03296  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment

    Authors: Aref Farhadipour, Hadi Veisi

    Abstract: Dysarthria is a disability that causes a disturbance in the human speech system and reduces the quality and intelligibility of a person's speech. Because of this effect, the normal speech processing systems can not work properly on impaired speech. This disability is usually associated with physical disabilities. Therefore, designing a system that can perform some tasks by receiving voice commands… ▽ More

    Submitted 20 March, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: 12 pages, 8 figures. Iran J Comput Sci (2024)