FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

Seth, Ashish; Ghosh, Sreyan; Umesh, S.; Manocha, Dinesh

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2312.13026 (eess)

[Submitted on 20 Dec 2023]

Title:FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

Authors:Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

View PDF HTML (experimental)

Abstract:Continued pre-training (CP) offers multiple advantages, like target domain adaptation and the potential to exploit the continuous stream of unlabeled data available online. However, continued pre-training on out-of-domain distributions often leads to catastrophic forgetting of previously acquired knowledge, leading to sub-optimal ASR performance. This paper presents FusDom, a simple and novel methodology for SSL-based continued pre-training. FusDom learns speech representations that are robust and adaptive yet not forgetful of concepts seen in the past. Instead of solving the SSL pre-text task on the output representations of a single model, FusDom leverages two identical pre-trained SSL models, a teacher and a student, with a modified pre-training head to solve the CP SSL pre-text task. This head employs a cross-attention mechanism between the representations of both models while only the student receives gradient updates and the teacher does not. Finally, the student is fine-tuned for ASR. In practice, FusDom outperforms all our baselines across settings significantly, with WER improvements in the range of 0.2 WER - 7.3 WER in the target domain while retaining the performance in the earlier domain.

Comments:	Accepted at ICASSP 2024. Code: this https URL
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2312.13026 [eess.AS]
	(or arXiv:2312.13026v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2312.13026

Submission history

From: Sreyan Ghosh [view email]
[v1] Wed, 20 Dec 2023 13:50:05 UTC (5,228 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators