Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax

Patil, Aditya; Joshi, Vikas; Agrawal, Purvi; Mehta, Rupesh

doi:10.1109/SLT54892.2023.10022475

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2401.11645 (eess)

[Submitted on 22 Jan 2024]

Title:Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax

Authors:Aditya Patil, Vikas Joshi, Purvi Agrawal, Rupesh Mehta

View PDF HTML (experimental)

Abstract:Even with several advancements in multilingual modeling, it is challenging to recognize multiple languages using a single neural model, without knowing the input language and most multilingual models assume the availability of the input language. In this work, we propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages and also support switching between the languages, without any language input from the user. The proposed model has shared encoder and prediction networks, with language-specific joint networks that are combined via a self-attention mechanism. As the language-specific posteriors are combined, it produces a single posterior probability over all the output symbols, enabling a single beam search decoding and also allowing dynamic switching between the languages. The proposed approach outperforms the conventional bilingual baseline with 13.3%, 8.23% and 1.3% word error rate relative reduction on Hindi, English and code-mixed test sets, respectively.

Comments:	Published in IEEE's Spoken Language Technology (SLT) 2022, 8 pages (6 + 2 for references), 5 figures
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2401.11645 [eess.AS]
	(or arXiv:2401.11645v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2401.11645
Journal reference:	2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023, pp. 252-259
Related DOI:	https://doi.org/10.1109/SLT54892.2023.10022475

Submission history

From: Aditya Patil [view email]
[v1] Mon, 22 Jan 2024 01:44:42 UTC (1,607 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators