A Survey of Deep Learning for Complex Speech Spectrograms

Xie, Yuying; Tan, Zheng-Hua

Abstract:Recent advancements in deep learning have significantly impacted the field of speech signal processing, particularly in the analysis and manipulation of complex spectrograms. This survey provides a comprehensive overview of the state-of-the-art techniques leveraging deep neural networks for processing complex spectrograms, which encapsulate both magnitude and phase information. We begin by introducing complex spectrograms and their associated features for various speech processing tasks. Next, we explore the key components and architectures of complex-valued neural networks, which are specifically designed to handle complex-valued data and have been applied for complex spectrogram processing. We then discuss various training strategies and loss functions tailored for training neural networks to process and model complex spectrograms. The survey further examines key applications, including phase retrieval, speech enhancement, and speech separation, where deep learning has achieved significant progress by leveraging complex spectrograms or their derived feature representations. Additionally, we examine the intersection of complex spectrograms with generative models. This survey aims to serve as a valuable resource for researchers and practitioners in the field of speech signal processing and complex-valued neural networks.

Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2505.08694 [eess.AS]
	(or arXiv:2505.08694v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2505.08694

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:A Survey of Deep Learning for Complex Speech Spectrograms

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators