An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

Hannan, Abdul; Brutti, Alessio; Nawaz, Shah; Noman, Mubashir

Computer Science > Computer Vision and Pattern Recognition

arXiv:2505.16991 (cs)

[Submitted on 22 May 2025 (v1), last revised 28 May 2025 (this version, v2)]

Title:An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

Authors:Abdul Hannan, Alessio Brutti, Shah Nawaz, Mubashir Noman

View PDF HTML (experimental)

Abstract:Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource devices is impractical despite of their favorable performance. Existing approaches (pruning, distillation, layer skip etc.) transform the large models into smaller ones at the cost of significant performance degradation or require prolonged training of smaller models for better performance. To address these issues, we introduce an efficacious two-step representation learning based approach capable of producing several small sized models from a single large model ensuring considerably better performance in limited number of epochs. Comprehensive experimentation on ASR benchmarks reveals the efficacy of our approach, achieving three-fold training speed-up and up to 12.54% word error rate improvement.

Comments:	Accepted at InterSpeech 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2505.16991 [cs.CV]
	(or arXiv:2505.16991v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2505.16991

Submission history

From: Mubashir Noman [view email]
[v1] Thu, 22 May 2025 17:55:09 UTC (472 KB)
[v2] Wed, 28 May 2025 17:19:11 UTC (476 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators