Robust Tuning Datasets for Statistical Machine Translation

Nakov, Preslav; Vogel, Stephan

Computer Science > Computation and Language

arXiv:1710.00346 (cs)

[Submitted on 1 Oct 2017]

Title:Robust Tuning Datasets for Statistical Machine Translation

Authors:Preslav Nakov, Stephan Vogel

View PDF

Abstract:We explore the idea of automatically crafting a tuning dataset for Statistical Machine Translation (SMT) that makes the hyper-parameters of the SMT system more robust with respect to some specific deficiencies of the parameter tuning algorithms. This is an under-explored research direction, which can allow better parameter tuning. In this paper, we achieve this goal by selecting a subset of the available sentence pairs, which are more suitable for specific combinations of optimizers, objective functions, and evaluation measures. We demonstrate the potential of the idea with the pairwise ranking optimization (PRO) optimizer, which is known to yield too short translations. We show that the learning problem can be alleviated by tuning on a subset of the development set, selected based on sentence length. In particular, using the longest 50% of the tuning sentences, we achieve two-fold tuning speedup, and improvements in BLEU score that rival those of alternatives, which fix BLEU+1's smoothing instead.

Comments:	RANLP-2017
Subjects:	Computation and Language (cs.CL)
MSC classes:	68T50
ACM classes:	I.2.7
Cite as:	arXiv:1710.00346 [cs.CL]
	(or arXiv:1710.00346v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1710.00346

Submission history

From: Preslav Nakov [view email]
[v1] Sun, 1 Oct 2017 13:18:48 UTC (77 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2017-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Preslav Nakov
Stephan Vogel

export BibTeX citation

Computer Science > Computation and Language

Title:Robust Tuning Datasets for Statistical Machine Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Robust Tuning Datasets for Statistical Machine Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators