Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction

Fan, Cunhang; Gao, Youdian; Pan, Zexu; Zhang, Jingjing; Zhang, Hongyu; Zhang, Jie; Lv, Zhao

Computer Science > Sound

arXiv:2501.01673 (cs)

[Submitted on 3 Jan 2025]

Title:Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction

Authors:Cunhang Fan, Youdian Gao, Zexu Pan, Jingjing Zhang, Hongyu Zhang, Jie Zhang, Zhao Lv

View PDF HTML (experimental)

Abstract:The recent rapid development of auditory attention decoding (AAD) offers the possibility of using electroencephalography (EEG) as auxiliary information for target speaker extraction. However, effectively modeling long sequences of speech and resolving the identity of the target speaker from EEG signals remains a major challenge. In this paper, an improved feature extraction network (IFENet) is proposed for neuro-oriented target speaker extraction, which mainly consists of a speech encoder with dual-path Mamba and an EEG encoder with Kolmogorov-Arnold Networks (KAN). We propose SpeechBiMamba, which makes use of dual-path Mamba in modeling local and global speech sequences to extract speech features. In addition, we propose EEGKAN to effectively extract EEG features that are closely related to the auditory stimuli and locate the target speaker through the subject's attention information. Experiments on the KUL and AVED datasets show that IFENet outperforms the state-of-the-art model, achieving 36\% and 29\% relative improvements in terms of scale-invariant signal-to-distortion ratio (SI-SDR) under an open evaluation condition.

Comments:	accepted by ICASSP2025
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2501.01673 [cs.SD]
	(or arXiv:2501.01673v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2501.01673

Submission history

From: Youdian Gao [view email]
[v1] Fri, 3 Jan 2025 07:27:51 UTC (389 KB)

Computer Science > Sound

Title:Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators