On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Lam, Tsz Kin; Ohta, Mayumi; Schamoni, Shigehiko; Riezler, Stefan

doi:10.21437/Interspeech.2021-1679

Computer Science > Computation and Language

arXiv:2104.01393 (cs)

[Submitted on 3 Apr 2021 (v1), last revised 9 Jun 2021 (this version, v2)]

Title:On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Authors:Tsz Kin Lam, Mayumi Ohta, Shigehiko Schamoni, Stefan Riezler

View PDF

Abstract:We propose an on-the-fly data augmentation method for automatic speech recognition (ASR) that uses alignment information to generate effective training samples. Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate previously unseen training pairs. The speech representations are sampled from an audio dictionary that has been extracted from the training corpus and inject speaker variations into the training examples. The transcribed tokens are either predicted by a language model such that the augmented data pairs are semantically close to the original data, or randomly sampled. Both strategies result in training pairs that improve robustness in ASR training. Our experiments on a Seq-to-Seq architecture show that ADA can be applied on top of SpecAugment, and achieves about 9-23% and 4-15% relative improvements in WER over SpecAugment alone on LibriSpeech 100h and LibriSpeech 960h test datasets, respectively.

Comments:	Accepted at INTERSPEECH 2021
Subjects:	Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2104.01393 [cs.CL]
	(or arXiv:2104.01393v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2104.01393
Related DOI:	https://doi.org/10.21437/Interspeech.2021-1679

Submission history

From: Shigehiko Schamoni [view email]
[v1] Sat, 3 Apr 2021 13:00:00 UTC (276 KB)
[v2] Wed, 9 Jun 2021 19:48:46 UTC (277 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-04

Change to browse by:

cs
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler

export BibTeX citation

Computer Science > Computation and Language

Title:On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators