Skip to main content

Showing 1–3 of 3 results for author: Perez-de-Viñaspre, O

.
  1. arXiv:2505.02456  [pdf, other

    cs.CL

    Colombian Waitresses y Jueces canadienses: Gender and Country Biases in Occupation Recommendations from LLMs

    Authors: Elisa Forcada Rodríguez, Olatz Perez-de-Viñaspre, Jon Ander Campos, Dietrich Klakow, Vagrant Gautam

    Abstract: One of the goals of fairness research in NLP is to measure and mitigate stereotypical biases that are propagated by NLP systems. However, such work tends to focus on single axes of bias (most often gender) and the English language. Addressing these limitations, we contribute the first study of multilingual intersecting country and gender biases, with a focus on occupation recommendations generated… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  2. arXiv:2205.01506  [pdf, other

    cs.CL cs.AI

    BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions

    Authors: Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri

    Abstract: Parliamentary transcripts provide a valuable resource to understand the reality and know about the most important facts that occur over time in our societies. Furthermore, the political debates captured in these transcripts facilitate research on political discourse from a computational social science perspective. In this paper we release the first version of a newly compiled corpus from Basque pa… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: 9 pages, 14 figures, 4 tables. To be published in LREC 2022

  3. arXiv:2203.08111  [pdf, other

    cs.CL cs.AI cs.LG

    Does Corpus Quality Really Matter for Low-Resource Languages?

    Authors: Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa

    Abstract: The vast majority of non-English corpora are derived from automatically filtered versions of CommonCrawl. While prior work has identified major issues on the quality of these datasets (Kreutzer et al., 2021), it is not clear how this impacts downstream performance. Taking representation learning in Basque as a case study, we explore tailored crawling (manually identifying and scraping websites wit… ▽ More

    Submitted 26 October, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: EMNLP 2022