Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models

Hasumi, Takuya; Nakamura, Tomohiko; Takamune, Norihiro; Saruwatari, Hiroshi; Kitamura, Daichi; Takahashi, Yu; Kondo, Kazunobu

Computer Science > Sound

arXiv:2109.00704 (cs)

[Submitted on 2 Sep 2021]

Title:Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models

Authors:Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo

View PDF

Abstract:Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art multichannel audio source separation methods using the source power estimation based on deep neural networks (DNNs). The DNN-based power estimation works well for sounds having timbres similar to the DNN training data. However, the sounds to which IDLMA is applied do not always have such timbres, and the timbral mismatch causes the performance degradation of IDLMA. To tackle this problem, we focus on a blind source separation counterpart of IDLMA, independent low-rank matrix analysis. It uses nonnegative matrix factorization (NMF) as the source model, which can capture source spectral components that only appear in the target mixture, using the low-rank structure of the source spectrogram as a clue. We thus extend the DNN-based source model to encompass the NMF-based source model on the basis of the product-of-expert concept, which we call the product of source models (PoSM). For the proposed PoSM-based IDLMA, we derive a computationally efficient parameter estimation algorithm based on an optimization principle called the majorization-minimization algorithm. Experimental evaluations show the effectiveness of the proposed method.

Comments:	8 pages, 5 figures, accepted for Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2021 (APSIPA ASC 2021)
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2109.00704 [cs.SD]
	(or arXiv:2109.00704v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2109.00704

Submission history

From: Takuya Hasumi [view email]
[v1] Thu, 2 Sep 2021 04:31:15 UTC (1,392 KB)

Computer Science > Sound

Title:Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators