Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models for Reverberant Speech Recognition

Lee, Jeehye; Lee, Myungin; Chang, Joon-Hyuk

Computer Science > Computation and Language

arXiv:1608.04983 (cs)

[Submitted on 17 Aug 2016]

Title:Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models for Reverberant Speech Recognition

Authors:Jeehye Lee, Myungin Lee, Joon-Hyuk Chang

View PDF

Abstract:Distant speech recognition is a challenge, particularly due to the corruption of speech signals by reverberation caused by large distances between the speaker and microphone. In order to cope with a wide range of reverberations in real-world situations, we present novel approaches for acoustic modeling including an ensemble of deep neural networks (DNNs) and an ensemble of jointly trained DNNs. First, multiple DNNs are established, each of which corresponds to a different reverberation time 60 (RT60) in a setup step. Also, each model in the ensemble of DNN acoustic models is further jointly trained, including both feature mapping and acoustic modeling, where the feature mapping is designed for the dereverberation as a front-end. In a testing phase, the two most likely DNNs are chosen from the DNN ensemble using maximum a posteriori (MAP) probabilities, computed in an online fashion by using maximum likelihood (ML)-based blind RT60 estimation and then the posterior probability outputs from two DNNs are combined using the ML-based weights as a simple average. Extensive experiments demonstrate that the proposed approach leads to substantial improvements in speech recognition accuracy over the conventional DNN baseline systems under diverse reverberant conditions.

Comments:	9 pages, 8 figures, 1 table
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1608.04983 [cs.CL]
	(or arXiv:1608.04983v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1608.04983

Submission history

From: Joon-Hyuk Chang [view email]
[v1] Wed, 17 Aug 2016 14:43:17 UTC (2,319 KB)

Computer Science > Computation and Language

Title:Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models for Reverberant Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models for Reverberant Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators