Skip to main content

Showing 1–1 of 1 results for author: Avetisyan, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.04303  [pdf, other

    cs.CL cs.LG

    Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP

    Authors: François Remy, Pieter Delobelle, Hayastan Avetisyan, Alfiya Khabibullina, Miryam de Lhoneux, Thomas Demeester

    Abstract: The development of monolingual language models for low and mid-resource languages continues to be hindered by the difficulty in sourcing high-quality training data. In this study, we present a novel cross-lingual vocabulary transfer strategy, trans-tokenization, designed to tackle this challenge and enable more efficient language adaptation. Our approach focuses on adapting a high-resource monolin… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted at COLM 2024