Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

Zhang, Wangyou; Saijo, Kohei; Jung, Jee-weon; Li, Chenda; Watanabe, Shinji; Qian, Yanmin

doi:10.21437/Interspeech.2024-1266

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2406.04269 (eess)

[Submitted on 6 Jun 2024]

Title:Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

Authors:Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian

View PDF HTML (experimental)

Abstract:Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrevealed. Meanwhile, the majority of research focuses on small-sized datasets with restricted diversity, leading to a plateau in performance improvement. In this paper, we aim to provide new insights for addressing the above issues by exploring the scalability of SE models in terms of architectures, model sizes, compute budgets, and dataset sizes. Our investigation involves several popular SE architectures and speech data from different domains. Experiments reveal both similarities and distinctions between the scaling effects in SE and other tasks such as speech recognition. These findings further provide insights into the under-explored SE directions, e.g., larger-scale multi-domain corpora and efficiently scalable architectures.

Comments:	5 pages, 3 figures, 4 tables, Accepted by Interspeech 2024
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2406.04269 [eess.AS]
	(or arXiv:2406.04269v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2406.04269
Related DOI:	https://doi.org/10.21437/Interspeech.2024-1266

Submission history

From: Wangyou Zhang [view email]
[v1] Thu, 6 Jun 2024 17:20:21 UTC (517 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators