Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

Kawanaka, Masaki; Koizumi, Yuma; Miyazaki, Ryoichi; Yatabe, Kohei

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2002.05879 (eess)

[Submitted on 14 Feb 2020]

Title:Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

Authors:Masaki Kawanaka, Yuma Koizumi, Ryoichi Miyazaki, Kohei Yatabe

View PDF

Abstract:Improving subjective sound quality of enhanced signals is one of the most important missions in speech enhancement. For evaluating the subjective quality, several methods related to perceptually-motivated objective sound quality assessment (OSQA) have been proposed such as PESQ (perceptual evaluation of speech quality). However, direct use of such measures for training deep neural network (DNN) is not allowed in most cases because popular OSQAs are non-differentiable with respect to DNN parameters. Therefore, the previous study has proposed to approximate the score of OSQAs by an auxiliary DNN so that its gradient can be used for training the primary DNN. One problem with this approach is instability of the training caused by the approximation error of the score. To overcome this problem, we propose to use stabilization techniques borrowed from reinforcement learning. The experiments, aimed to increase the score of PESQ as an example, show that the proposed method (i) can stably train a DNN to increase PESQ, (ii) achieved the state-of-the-art PESQ score on a public dataset, and (iii) resulted in better sound quality than conventional methods based on subjective evaluation.

Comments:	accepted to the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
Cite as:	arXiv:2002.05879 [eess.AS]
	(or arXiv:2002.05879v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2002.05879

Submission history

From: Masaki Kawanaka [view email]
[v1] Fri, 14 Feb 2020 05:44:17 UTC (1,943 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators