Skip to main content

Showing 1–6 of 6 results for author: Taherian, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2503.17886  [pdf, other

    cs.SD eess.AS

    Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition

    Authors: Yufeng Yang, Hassan Taherian, Vahid Ahmadi Kalkhorani, DeLiang Wang

    Abstract: Despite the tremendous success of automatic speech recognition (ASR) with the introduction of deep learning, its performance is still unsatisfactory in many real-world multi-talker scenarios. Speaker separation excels in separating individual talkers but, as a frontend, it introduces processing artifacts that degrade the ASR backend trained on clean speech. As a result, mainstream robust ASR syste… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

  2. arXiv:2311.08630  [pdf, other

    eess.AS cs.SD

    Multi-channel Conversational Speaker Separation via Neural Diarization

    Authors: Hassan Taherian, DeLiang Wang

    Abstract: When dealing with overlapped speech, the performance of automatic speech recognition (ASR) systems substantially degrades as they are designed for single-talker speech. To enhance ASR performance in conversational or meeting environments, continuous speaker separation (CSS) is commonly employed. However, CSS requires a short separation window to avoid many speakers inside the window and sequential… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 10 pages, 4 figures

  3. arXiv:2301.06458  [pdf, other

    eess.AS cs.SD

    Multi-resolution location-based training for multi-channel continuous speech separation

    Authors: Hassan Taherian, DeLiang Wang

    Abstract: The performance of automatic speech recognition (ASR) systems severely degrades when multi-talker speech overlap occurs. In meeting environments, speech separation is typically performed to improve the robustness of ASR systems. Recently, location-based training (LBT) was proposed as a new training criterion for multi-channel talker-independent speaker separation. Assuming fixed array geometry, LB… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

    Comments: Submitted to ICASSP 23

  4. arXiv:2211.02944  [pdf, other

    eess.AS cs.SD

    Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation

    Authors: Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka

    Abstract: Personalized speech enhancement (PSE) models achieve promising results compared with unconditional speech enhancement models due to their ability to remove interfering speech in addition to background noise. Unlike unconditional speech enhancement, causal PSE models may occasionally remove the target speech by mistake. The PSE models also tend to leak interfering speech when the target speaker is… ▽ More

    Submitted 5 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  5. arXiv:2110.10330  [pdf, other

    eess.AS cs.SD

    One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement

    Authors: Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang

    Abstract: With the recent surge of video conferencing tools usage, providing high-quality speech signals and accurate captions have become essential to conduct day-to-day business or connect with friends and families. Single-channel personalized speech enhancement (PSE) methods show promising results compared with the unconditional speech enhancement (SE) methods in these scenarios due to their ability to r… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022

  6. arXiv:2110.04289  [pdf, other

    eess.AS cs.SD

    Location-based training for multi-channel talker-independent speaker separation

    Authors: Hassan Taherian, Ke Tan, DeLiang Wang

    Abstract: Permutation-invariant training (PIT) is a dominant approach for addressing the permutation ambiguity problem in talker-independent speaker separation. Leveraging spatial information afforded by microphone arrays, we propose a new training approach to resolving permutation ambiguities for multi-channel speaker separation. The proposed approach, named location-based training (LBT), assigns speakers… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: submitted to ICASSP 22