Skip to main content

Showing 1–2 of 2 results for author: Lérida, J d P

.
  1. arXiv:2501.16533  [pdf, other

    cs.CL cs.LG

    A comparison of data filtering techniques for English-Polish LLM-based machine translation in the biomedical domain

    Authors: Jorge del Pozo Lérida, Kamil Kojs, János Máté, Mikołaj Antoni Barański, Christian Hardmeier

    Abstract: Large Language Models (LLMs) have become state-of-the-art in Machine Translation (MT), often trained on massive bilingual parallel corpora scraped from the web, that contain low-quality entries and redundant information, leading to significant computational challenges. Various data filtering methods exist to reduce dataset sizes, but their effectiveness largely varies based on specific language pa… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

  2. arXiv:2501.10727  [pdf, other

    cs.CV cs.AI eess.IV

    In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review

    Authors: Amelia Jiménez-Sánchez, Natalia-Rozalia Avlona, Sarah de Boer, Víctor M. Campello, Aasa Feragen, Enzo Ferrante, Melanie Ganz, Judy Wawira Gichoya, Camila González, Steff Groefsema, Alessa Hering, Adam Hulman, Leo Joskowicz, Dovile Juodelyte, Melih Kandemir, Thijs Kooi, Jorge del Pozo Lérida, Livie Yumeng Li, Andre Pacheco, Tim Rädsch, Mauricio Reyes, Théo Sourget, Bram van Ginneken, David Wen, Nina Weng , et al. (4 additional authors not shown)

    Abstract: Datasets play a critical role in medical imaging research, yet issues such as label quality, shortcuts, and metadata are often overlooked. This lack of attention may harm the generalizability of algorithms and, consequently, negatively impact patient outcomes. While existing medical imaging literature reviews mostly focus on machine learning (ML) methods, with only a few focusing on datasets for s… ▽ More

    Submitted 2 June, 2025; v1 submitted 18 January, 2025; originally announced January 2025.

    Comments: ACM Conference on Fairness, Accountability, and Transparency - FAccT 2025