Trainable Adaptive Window Switching for Speech Enhancement

Koizumi, Yuma; Harada, Noboru; Haneda, Yoichi

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1811.02438 (eess)

[Submitted on 5 Nov 2018 (v1), last revised 19 Feb 2019 (this version, v4)]

Title:Trainable Adaptive Window Switching for Speech Enhancement

Authors:Yuma Koizumi, Noboru Harada, Yoichi Haneda

View PDF

Abstract:This study proposes a trainable adaptive window switching (AWS) method and apply it to a deep-neural-network (DNN) for speech enhancement in the modified discrete cosine transform domain. Time-frequency (T-F) mask processing in the short-time Fourier transform (STFT)-domain is a typical speech enhancement method. To recover the target signal precisely, DNN-based short-time frequency transforms have recently been investigated and used instead of the STFT. However, since such a fixed-resolution short-time frequency transform method has a T-F resolution problem based on the uncertainty principle, not only the short-time frequency transform but also the length of the windowing function should be optimized. To overcome this problem, we incorporate AWS into the speech enhancement procedure, and the windowing function of each time-frame is manipulated using a DNN depending on the input signal. We confirmed that the proposed method achieved a higher signal-to-distortion ratio than conventional speech enhancement methods in fixed-resolution frequency domains.

Comments:	accepted to the 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019)
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP); Machine Learning (stat.ML)
Cite as:	arXiv:1811.02438 [eess.AS]
	(or arXiv:1811.02438v4 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1811.02438

Submission history

From: Yuma Koizumi [view email]
[v1] Mon, 5 Nov 2018 12:25:42 UTC (1,016 KB)
[v2] Fri, 30 Nov 2018 09:14:01 UTC (1,166 KB)
[v3] Mon, 18 Feb 2019 05:45:31 UTC (1,573 KB)
[v4] Tue, 19 Feb 2019 23:56:50 UTC (1,573 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Trainable Adaptive Window Switching for Speech Enhancement

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Trainable Adaptive Window Switching for Speech Enhancement

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators