Skip to main content

Showing 1–3 of 3 results for author: Ragnarsson, P O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.17906  [pdf, other

    cs.CL

    Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora

    Authors: Svanhvít Lilja Ingólfsdóttir, Pétur Orri Ragnarsson, Haukur Páll Jónsson, Haukur Barri Símonarson, Vilhjálmur Þorsteinsson, Vésteinn Snæbjarnarson

    Abstract: Grammatical error correction (GEC) is the task of correcting typos, spelling, punctuation and grammatical issues in text. Approaching the problem as a sequence-to-sequence task, we compare the use of a common subword unit vocabulary and byte-level encoding. Initial synthetic training data is created using an error-generating pipeline, and used for finetuning two subword-level models and one byte-l… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  2. arXiv:2201.05601  [pdf, ps, other

    cs.CL

    A Warm Start and a Clean Crawled Corpus -- A Recipe for Good Language Models

    Authors: Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Páll Jónsson, Vilhjálmur Þorsteinsson, Hafsteinn Einarsson

    Abstract: We train several language models for Icelandic, including IceBERT, that achieve state-of-the-art performance in a variety of downstream tasks, including part-of-speech tagging, named entity recognition, grammatical error detection and constituency parsing. To train the models we introduce a new corpus of Icelandic text, the Icelandic Common Crawl Corpus (IC3), a collection of high quality texts fo… ▽ More

    Submitted 18 January, 2022; v1 submitted 14 January, 2022; originally announced January 2022.

  3. arXiv:2109.07343  [pdf, other

    cs.CL

    Miðeind's WMT 2021 submission

    Authors: Haukur Barri Símonarson, Vésteinn Snæbjarnarson, Pétur Orri Ragnarsson, Haukur Páll Jónsson, Vilhjálmur Þorsteinsson

    Abstract: We present Miðeind's submission for the English$\to$Icelandic and Icelandic$\to$English subsets of the 2021 WMT news translation task. Transformer-base models are trained for translation on parallel data to generate backtranslations iteratively. A pretrained mBART-25 model is then adapted for translation using parallel data as well as the last backtranslation iteration. This adapted pretrained mod… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Journal ref: 2021.wmt-1.9