LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration

Lee, Sangmin; Chung, Woo-Jin; Kang, Hong-Goo

Computer Science > Computation and Language

arXiv:2412.15299 (cs)

[Submitted on 19 Dec 2024 (v1), last revised 23 Dec 2024 (this version, v2)]

Title:LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration

Authors:Sangmin Lee, Woo-Jin Chung, Hong-Goo Kang

View PDF HTML (experimental)

Abstract:Building a universal multilingual automatic speech recognition (ASR) model that performs equitably across languages has long been a challenge due to its inherent difficulties. To address this task we introduce a Language-Agnostic Multilingual ASR pipeline through orthography Unification and language-specific Transliteration (LAMA-UT). LAMA-UT operates without any language-specific modules while matching the performance of state-of-the-art models trained on a minimal amount of data. Our pipeline consists of two key steps. First, we utilize a universal transcription generator to unify orthographic features into Romanized form and capture common phonetic characteristics across diverse languages. Second, we utilize a universal converter to transform these universal transcriptions into language-specific ones. In experiments, we demonstrate the effectiveness of our proposed method leveraging universal transcriptions for massively multilingual ASR. Our pipeline achieves a relative error reduction rate of 45% when compared to Whisper and performs comparably to MMS, despite being trained on only 0.1% of Whisper's training data. Furthermore, our pipeline does not rely on any language-specific modules. However, it performs on par with zero-shot ASR approaches which utilize additional language-specific lexicons and language models. We expect this framework to serve as a cornerstone for flexible multilingual ASR systems that are generalizable even to unseen languages.

Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2412.15299 [cs.CL]
	(or arXiv:2412.15299v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.15299

Submission history

From: Sangmin Lee [view email]
[v1] Thu, 19 Dec 2024 10:39:08 UTC (543 KB)
[v2] Mon, 23 Dec 2024 03:37:25 UTC (543 KB)

Computer Science > Computation and Language

Title:LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators