Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

Bansal, Sameer; Kamper, Herman; Livescu, Karen; Lopez, Adam; Goldwater, Sharon

Computer Science > Computation and Language

arXiv:1809.01431 (cs)

[Submitted on 5 Sep 2018 (v1), last revised 27 Feb 2019 (this version, v2)]

Title:Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

Authors:Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater

View PDF

Abstract:We present a simple approach to improve direct speech-to-text translation (ST) when the source language is low-resource: we pre-train the model on a high-resource automatic speech recognition (ASR) task, and then fine-tune its parameters for ST. We demonstrate that our approach is effective by pre-training on 300 hours of English ASR data to improve Spanish-English ST from 10.8 to 20.2 BLEU when only 20 hours of Spanish-English ST training data are available. Through an ablation study, we find that the pre-trained encoder (acoustic model) accounts for most of the improvement, despite the fact that the shared language in these tasks is the target language text, not the source language audio. Applying this insight, we show that pre-training on ASR helps ST even when the ASR language differs from both source and target ST languages: pre-training on French ASR also improves Spanish-English ST. Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3.5 to 7.1 BLEU.

Comments:	Accepted for publication in NAACL 2019
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1809.01431 [cs.CL]
	(or arXiv:1809.01431v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1809.01431

Submission history

From: Sameer Bansal [view email]
[v1] Wed, 5 Sep 2018 10:56:30 UTC (137 KB)
[v2] Wed, 27 Feb 2019 23:47:26 UTC (138 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2018-09

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Sameer Bansal
Herman Kamper
Karen Livescu
Adam Lopez
Sharon Goldwater

export BibTeX citation

Computer Science > Computation and Language

Title:Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators