Efficient Speech Translation with Dynamic Latent Perceivers

Tsiamas, Ioannis; Gállego, Gerard I.; Fonollosa, José A. R.; Costa-jussà, Marta R.

Computer Science > Computation and Language

arXiv:2210.16264 (cs)

[Submitted on 28 Oct 2022 (v1), last revised 14 Mar 2023 (this version, v2)]

Title:Efficient Speech Translation with Dynamic Latent Perceivers

Authors:Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà

View PDF

Abstract:Transformers have been the dominant architecture for Speech Translation in recent years, achieving significant improvements in translation quality. Since speech signals are longer than their textual counterparts, and due to the quadratic complexity of the Transformer, a down-sampling step is essential for its adoption in Speech Translation. Instead, in this research, we propose to ease the complexity by using a Perceiver encoder to map the speech inputs to a fixed-length latent representation. Furthermore, we introduce a novel way of training Perceivers, with Dynamic Latent Access (DLA), unlocking larger latent spaces without any additional computational overhead. Speech-to-Text Perceivers with DLA can match the performance of Transformer baselines across three language pairs in MuST-C. Finally, a DLA-trained model is easily adaptable to DLA at inference, and can be flexibly deployed with various computational budgets, without significant drops in translation quality.

Comments:	ICASSP 2023
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2210.16264 [cs.CL]
	(or arXiv:2210.16264v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.16264

Submission history

From: Ioannis Tsiamas [view email]
[v1] Fri, 28 Oct 2022 16:52:48 UTC (2,191 KB)
[v2] Tue, 14 Mar 2023 11:08:18 UTC (2,304 KB)

Computer Science > Computation and Language

Title:Efficient Speech Translation with Dynamic Latent Perceivers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Efficient Speech Translation with Dynamic Latent Perceivers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators