Search | arXiv e-print repository

Domain adapted machine translation: What does catastrophic forgetting forget and why?

Authors: Danielle Saunders, Steve DeNeefe

Abstract: Neural Machine Translation (NMT) models can be specialized by domain adaptation, often involving fine-tuning on a dataset of interest. This process risks catastrophic forgetting: rapid loss of generic translation quality. Forgetting has been widely observed, with many mitigation methods proposed. However, the causes of forgetting and the relationship between forgetting and adaptation data are unde… ▽ More Neural Machine Translation (NMT) models can be specialized by domain adaptation, often involving fine-tuning on a dataset of interest. This process risks catastrophic forgetting: rapid loss of generic translation quality. Forgetting has been widely observed, with many mitigation methods proposed. However, the causes of forgetting and the relationship between forgetting and adaptation data are under-explored. This paper takes a novel approach to understanding catastrophic forgetting during NMT adaptation by investigating the impact of the data. We provide a first investigation of what is forgotten, and why. We examine the relationship between forgetting and the in-domain data, and show that the amount and type of forgetting is linked to that data's target vocabulary coverage. Our findings pave the way toward better informed NMT domain adaptation. △ Less

Submitted 23 December, 2024; originally announced December 2024.

Comments: EMNLP 2024

arXiv:2302.06579 [pdf, other]

AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature

Authors: Melissa Roemmele, Kyle Shaffer, Katrina Olsen, Yiyi Wang, Steve DeNeefe

Abstract: Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the ling… ▽ More Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the linguistic relations of these alignments, and create automated models to predict these relations as well as to generate abridgements for new texts. Our findings establish abridgement as a challenging task, motivating future resources and research. The dataset is available at github.com/roemmele/AbLit. △ Less

Submitted 13 February, 2023; originally announced February 2023.

Comments: Accepted at EACL 2023

arXiv:2103.03820 [pdf, other]

AnswerQuest: A System for Generating Question-Answer Items from Multi-Paragraph Documents

Authors: Melissa Roemmele, Deep Sidhpura, Steve DeNeefe, Ling Tsou

Abstract: One strategy for facilitating reading comprehension is to present information in a question-and-answer format. We demo a system that integrates the tasks of question answering (QA) and question generation (QG) in order to produce Q&A items that convey the content of multi-paragraph documents. We report some experiments for QA and QG that yield improvements on both tasks, and assess how they intera… ▽ More One strategy for facilitating reading comprehension is to present information in a question-and-answer format. We demo a system that integrates the tasks of question answering (QA) and question generation (QG) in order to produce Q&A items that convey the content of multi-paragraph documents. We report some experiments for QA and QG that yield improvements on both tasks, and assess how they interact to produce a list of Q&A items for a text. The demo is accessible at qna.sdl.com. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: Accepted at demo track of EACL 2021

Showing 1–3 of 3 results for author: DeNeefe, S