How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

Iwamoto, Kazuma; Ochiai, Tsubasa; Delcroix, Marc; Ikeshita, Rintaro; Sato, Hiroshi; Araki, Shoko; Katagiri, Shigeru

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2201.06685 (eess)

[Submitted on 18 Jan 2022 (v1), last revised 30 Mar 2022 (this version, v2)]

Title:How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

Authors:Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

View PDF

Abstract:It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using orthogonal projection-based decomposition (OPD). OPD decomposes the SE errors into noise and artifact components. The artifact component is defined as the SE error signal that cannot be represented as a linear combination of speech and noise sources. We propose manually scaling the error components to analyze their impact on ASR. We experimentally identify the artifact component as the main cause of performance degradation, and we find that mitigating the artifact can greatly improve ASR performance. Furthermore, we demonstrate that the simple observation adding (OA) technique (i.e., adding a scaled version of the observed signal to the enhanced speech) can monotonically increase the signal-to-artifact ratio under a mild condition. Accordingly, we experimentally confirm that OA improves ASR performance for both simulated and real recordings. The findings of this paper provide a better understanding of the influence of SE errors on ASR and open the door to future research on novel approaches for designing effective single-channel SE front-ends for ASR.

Comments:	5 pages, 5 figures, submitted to Interspeech 2022
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2201.06685 [eess.AS]
	(or arXiv:2201.06685v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2201.06685

Submission history

From: Tsubasa Ochiai [view email]
[v1] Tue, 18 Jan 2022 01:12:01 UTC (635 KB)
[v2] Wed, 30 Mar 2022 11:18:52 UTC (502 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators