Showing 1–2 of 2 results for author: Menne, T

Search v0.5.6 released 2020-02-24

arXiv:1905.03500 [pdf, other]

cs.SD cs.CL eess.AS

doi 10.21437/Interspeech.2019-1728

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

Authors: Tobias Menne, Ilya Sklyar, Ralf Schlüter, Hermann Ney

Abstract: Significant performance degradation of automatic speech recognition (ASR) systems is observed when the audio signal contains cross-talk. One of the recently proposed approaches to solve the problem of multi-speaker ASR is the deep clustering (DPCL) approach. Combining DPCL with a state-of-the-art hybrid acoustic model, we obtain a word error rate (WER) of 16.5 % on the commonly used wsj0-2mix data… ▽ More Significant performance degradation of automatic speech recognition (ASR) systems is observed when the audio signal contains cross-talk. One of the recently proposed approaches to solve the problem of multi-speaker ASR is the deep clustering (DPCL) approach. Combining DPCL with a state-of-the-art hybrid acoustic model, we obtain a word error rate (WER) of 16.5 % on the commonly used wsj0-2mix dataset, which is the best performance reported thus far to the best of our knowledge. The wsj0-2mix dataset contains simulated cross-talk where the speech of multiple speakers overlaps for almost the entire utterance. In a more realistic ASR scenario the audio signal contains significant portions of single-speaker speech and only part of the signal contains speech of multiple competing speakers. This paper investigates obstacles of applying DPCL as a preprocessing method for ASR in such a scenario of sparsely overlapping speech. To this end we present a data simulation approach, closely related to the wsj0-2mix dataset, generating sparsely overlapping speech datasets of arbitrary overlap ratio. The analysis of applying DPCL to sparsely overlapping speech is an important interim step between the fully overlapping datasets like wsj0-2mix and more realistic ASR datasets, such as CHiME-5 or AMI. △ Less

Submitted 25 September, 2019; v1 submitted 9 May, 2019; originally announced May 2019.

Journal ref: Proceedings of INTERSPEECH 2019
arXiv:1806.07407 [pdf, other]

cs.CL cs.SD eess.AS

Speaker Adapted Beamforming for Multi-Channel Automatic Speech Recognition

Authors: Tobias Menne, Ralf Schlüter, Hermann Ney

Abstract: This paper presents, in the context of multi-channel ASR, a method to adapt a mask based, statistically optimal beamforming approach to a speaker of interest. The beamforming vector of the statistically optimal beamformer is computed by utilizing speech and noise masks, which are estimated by a neural network. The proposed adaptation approach is based on the integration of the beamformer, which in… ▽ More This paper presents, in the context of multi-channel ASR, a method to adapt a mask based, statistically optimal beamforming approach to a speaker of interest. The beamforming vector of the statistically optimal beamformer is computed by utilizing speech and noise masks, which are estimated by a neural network. The proposed adaptation approach is based on the integration of the beamformer, which includes the mask estimation network, and the acoustic model of the ASR system. This allows for the propagation of the training error, from the acoustic modeling cost function, all the way through the beamforming operation and through the mask estimation network. By using the results of a first pass recognition and by keeping all other parameters fixed, the mask estimation network can therefore be fine tuned by retraining. Utterances of a speaker of interest can thus be used in a two pass approach, to optimize the beamforming for the speech characteristics of that specific speaker. It is shown that this approach improves the ASR performance of a state-of-the-art multi-channel ASR system on the CHiME-4 data. Furthermore the effect of the adaptation on the estimated speech masks is discussed. △ Less

Submitted 19 June, 2018; originally announced June 2018.

Comments: submitted to IEEE SLT 2018

Search v0.5.6 released 2020-02-24