A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport

Kaloga, Yacouba; Kumar, Shashi; Motlicek, Petr; Kodrasi, Ina

Computer Science > Machine Learning

arXiv:2502.01588 (cs)

[Submitted on 3 Feb 2025]

Title:A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport

Authors:Yacouba Kaloga, Shashi Kumar, Petr Motlicek, Ina Kodrasi

View PDF HTML (experimental)

Abstract:Accurate sequence-to-sequence (seq2seq) alignment is critical for applications like medical speech analysis and language learning tools relying on automatic speech recognition (ASR). State-of-the-art end-to-end (E2E) ASR systems, such as the Connectionist Temporal Classification (CTC) and transducer-based models, suffer from peaky behavior and alignment inaccuracies. In this paper, we propose a novel differentiable alignment framework based on one-dimensional optimal transport, enabling the model to learn a single alignment and perform ASR in an E2E manner. We introduce a pseudo-metric, called Sequence Optimal Transport Distance (SOTD), over the sequence space and discuss its theoretical properties. Based on the SOTD, we propose Optimal Temporal Transport Classification (OTTC) loss for ASR and contrast its behavior with CTC. Experimental results on the TIMIT, AMI, and LibriSpeech datasets show that our method considerably improves alignment performance, though with a trade-off in ASR performance when compared to CTC. We believe this work opens new avenues for seq2seq alignment research, providing a solid foundation for further exploration and development within the community.

Subjects:	Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
Cite as:	arXiv:2502.01588 [cs.LG]
	(or arXiv:2502.01588v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.01588

Submission history

From: Shashi Kumar [view email]
[v1] Mon, 3 Feb 2025 18:20:29 UTC (1,426 KB)

Computer Science > Machine Learning

Title:A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators