Skip to main content

Showing 1–2 of 2 results for author: Toossi, H

.
  1. arXiv:2405.11125  [pdf, other

    cs.CL

    A Reproducibility Study on Quantifying Language Similarity: The Impact of Missing Values in the URIEL Knowledge Base

    Authors: Hasti Toossi, Guo Qing Huai, Jinyu Liu, Eric Khiu, A. Seza Doğruöz, En-Shiun Annie Lee

    Abstract: In the pursuit of supporting more languages around the world, tools that characterize properties of languages play a key role in expanding the existing multilingual NLP research. In this study, we focus on a widely used typological knowledge base, URIEL, which aggregates linguistic information into numeric vectors. Specifically, we delve into the soundness and reproducibility of the approach taken… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: NAACL 2024 SRW

  2. arXiv:2402.02633  [pdf, other

    cs.CL cs.LG

    Predicting Machine Translation Performance on Low-Resource Languages: The Role of Domain Similarity

    Authors: Eric Khiu, Hasti Toossi, David Anugraha, Jinyu Liu, Jiaxu Li, Juan Armando Parra Flores, Leandro Acros Roman, A. Seza Doğruöz, En-Shiun Annie Lee

    Abstract: Fine-tuning and testing a multilingual large language model is expensive and challenging for low-resource languages (LRLs). While previous studies have predicted the performance of natural language processing (NLP) tasks using machine learning methods, they primarily focus on high-resource languages, overlooking LRLs and shifts across domains. Focusing on LRLs, we investigate three factors: the si… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 13 pages, 5 figures, accepted to EACL 2024, findings