persoDA: Personalized Data Augmentation for Personalized ASR

Parada, Pablo Peso; Fontalis, Spyros; Jalal, Md Asif; Saravanan, Karthikeyan; Drosou, Anastasios; Ozay, Mete; Lee, Gil Ho; Lee, Jungin; Jung, Seokyeong

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2501.09113 (eess)

[Submitted on 15 Jan 2025 (v1), last revised 17 Jan 2025 (this version, v2)]

Title:persoDA: Personalized Data Augmentation for Personalized ASR

Authors:Pablo Peso Parada, Spyros Fontalis, Md Asif Jalal, Karthikeyan Saravanan, Anastasios Drosou, Mete Ozay, Gil Ho Lee, Jungin Lee, Seokyeong Jung

View PDF

Abstract:Data augmentation (DA) is ubiquitously used in training of Automatic Speech Recognition (ASR) models. DA offers increased data variability, robustness and generalization against different acoustic distortions. Recently, personalization of ASR models on mobile devices has been shown to improve Word Error Rate (WER). This paper evaluates data augmentation in this context and proposes persoDA; a DA method driven by user's data utilized to personalize ASR. persoDA aims to augment training with data specifically tuned towards acoustic characteristics of the end-user, as opposed to standard augmentation based on Multi-Condition Training (MCT) that applies random reverberation and noises. Our evaluation with an ASR conformer-based baseline trained on Librispeech and personalized for VOICES shows that persoDA achieves a 13.9% relative WER reduction over using standard data augmentation (using random noise & reverberation). Furthermore, persoDA shows 16% to 20% faster convergence over MCT.

Comments:	ICASSP'25-Copyright 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2501.09113 [eess.AS]
	(or arXiv:2501.09113v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2501.09113

Submission history

From: Pablo Peso Parada [view email]
[v1] Wed, 15 Jan 2025 19:46:42 UTC (360 KB)
[v2] Fri, 17 Jan 2025 15:03:56 UTC (360 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:persoDA: Personalized Data Augmentation for Personalized ASR

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:persoDA: Personalized Data Augmentation for Personalized ASR

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators