Skip to main content

Showing 1–50 of 126 results for author: Martins, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.00994  [pdf, ps, other

    cs.CL

    Should We Still Pretrain Encoders with Masked Language Modeling?

    Authors: Hippolyte Gisserot-Boukhlef, Nicolas Boizard, Manuel Faysse, Duarte M. Alves, Emmanuel Malherbe, André F. T. Martins, Céline Hudelot, Pierre Colombo

    Abstract: Learning high-quality text representations is fundamental to a wide range of NLP tasks. While encoder pretraining has traditionally relied on Masked Language Modeling (MLM), recent evidence suggests that decoder models pretrained with Causal Language Modeling (CLM) can be effectively repurposed as encoders, often surpassing traditional encoders on text representation benchmarks. However, it remain… ▽ More

    Submitted 4 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

    Comments: 23 pages, 10 figures, 17 tables

  2. arXiv:2506.17080  [pdf, ps, other

    cs.CL cs.AI

    Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs

    Authors: Ricardo Rei, Nuno M. Guerreiro, José Pombal, João Alves, Pedro Teixeirinha, Amin Farajian, André F. T. Martins

    Abstract: Fine-tuning pretrained LLMs has been shown to be an effective strategy for reaching state-of-the-art performance on specific tasks like machine translation. However, this process of adaptation often implies sacrificing general-purpose capabilities, such as conversational reasoning and instruction-following, hampering the utility of the system in real-world applications that require a mixture of sk… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  3. arXiv:2506.17019  [pdf, other

    cs.CL cs.AI

    Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning

    Authors: Giuseppe Attanasio, Sonal Sannigrahi, Ben Peters, André F. T. Martins

    Abstract: This paper presents the IT-IST submission to the IWSLT 2025 Shared Task on Instruction Following Speech Processing. We submit results for the Short Track, i.e., speech recognition, translation, and spoken question answering. Our model is a unified speech-to-text model that integrates a pre-trained continuous speech encoder and text decoder through a first phase of modality alignment and a second p… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 7 pages, 1 figure, IWSLT 2025

  4. arXiv:2506.16640  [pdf, ps, other

    cs.CL cs.AI

    Long-Context Generalization with Sparse Attention

    Authors: Pavlo Vasylenko, Marcos Treviso, André F. T. Martins

    Abstract: Transformer-based architectures traditionally employ softmax to compute attention weights, which produces dense distributions over all tokens in a sequence. While effective in many settings, this density has been shown to be detrimental for tasks that demand precise focus on fixed-size patterns: as sequence length increases, non-informative tokens accumulate attention probability mass, leading to… ▽ More

    Submitted 24 June, 2025; v1 submitted 19 June, 2025; originally announced June 2025.

  5. arXiv:2506.13468  [pdf, ps, other

    cs.CL cs.AI

    An Interdisciplinary Approach to Human-Centered Machine Translation

    Authors: Marine Carpuat, Omri Asscher, Kalika Bali, Luisa Bentivogli, Frédéric Blain, Lynne Bowker, Monojit Choudhury, Hal Daumé III, Kevin Duh, Ge Gao, Alvin Grissom II, Marzena Karpinska, Elaine C. Khoong, William D. Lewis, André F. T. Martins, Mary Nurminen, Douglas W. Oard, Maja Popovic, Michel Simard, François Yvon

    Abstract: Machine Translation (MT) tools are widely used today, often in contexts where professional translators are not present. Despite progress in MT technology, a gap persists between system development and real-world usage, particularly for non-expert users who may struggle to assess translation reliability. This paper advocates for a human-centered approach to MT, emphasizing the alignment of system d… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 20 pages

  6. arXiv:2506.06275  [pdf, ps, other

    cs.CV cs.CL cs.LG

    Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding

    Authors: Emmanouil Zaranis, António Farinhas, Saul Santos, Beatriz Canaverde, Miguel Moura Ramos, Aditya K Surikuchi, André Viveiros, Baohao Liao, Elena Bueno-Benito, Nithin Sivakumaran, Pavlo Vasylenko, Shoubin Yu, Sonal Sannigrahi, Wafaa Mohammed, Ben Peters, Danae Sánchez Villegas, Elias Stengel-Eskin, Giuseppe Attanasio, Jaehong Yoon, Stella Frank, Alessandro Suglia, Chrysoula Zerva, Desmond Elliott, Mariella Dimiccoli, Mohit Bansal , et al. (6 additional authors not shown)

    Abstract: Despite recent progress in vision-language models (VLMs), holistic understanding of long-form video content remains a significant challenge, partly due to limitations in current benchmarks. Many focus on peripheral, ``needle-in-a-haystack'' details, encouraging context-insensitive retrieval over deep comprehension. Others rely on large-scale, semi-automatically generated questions (often produced… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Under Review

  7. arXiv:2506.04079  [pdf, ps, other

    cs.CL cs.AI cs.LG

    EuroLLM-9B: Technical Report

    Authors: Pedro Henrique Martins, João Alves, Patrick Fernandes, Nuno M. Guerreiro, Ricardo Rei, Amin Farajian, Mateusz Klimaszewski, Duarte M. Alves, José Pombal, Nicolas Boizard, Manuel Faysse, Pierre Colombo, François Yvon, Barry Haddow, José G. C. de Souza, Alexandra Birch, André F. T. Martins

    Abstract: This report presents EuroLLM-9B, a large language model trained from scratch to support the needs of European citizens by covering all 24 official European Union languages and 11 additional languages. EuroLLM addresses the issue of European languages being underrepresented and underserved in existing open large language models. We provide a comprehensive overview of EuroLLM-9B's development, inclu… ▽ More

    Submitted 16 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

    Comments: 56 pages

  8. arXiv:2504.13713  [pdf, other

    cs.RO cs.CV

    SLAM&Render: A Benchmark for the Intersection Between Neural Rendering, Gaussian Splatting and SLAM

    Authors: Samuel Cerezo, Gaetano Meli, Tomás Berriel Martins, Kirill Safronov, Javier Civera

    Abstract: Models and methods originally developed for novel view synthesis and scene rendering, such as Neural Radiance Fields (NeRF) and Gaussian Splatting, are increasingly being adopted as representations in Simultaneous Localization and Mapping (SLAM). However, existing datasets fail to include the specific challenges of both fields, such as multimodality and sequentiality in SLAM or generalization acro… ▽ More

    Submitted 21 April, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

    Comments: 8 pages, 8 figures, RA-L submission

  9. arXiv:2504.12140  [pdf, other

    cs.CL

    Multilingual Contextualization of Large Language Models for Document-Level Machine Translation

    Authors: Miguel Moura Ramos, Patrick Fernandes, Sweta Agrawal, André F. T. Martins

    Abstract: Large language models (LLMs) have demonstrated strong performance in sentence-level machine translation, but scaling to document-level translation remains challenging, particularly in modeling long-range dependencies and discourse phenomena across sentences and paragraphs. In this work, we propose a method to improve LLM-based long-document translation through targeted fine-tuning on high-quality… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 9 pages, work-in-progress

  10. arXiv:2504.07583  [pdf, other

    cs.CL cs.LG

    Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering

    Authors: Patrick Fernandes, Sweta Agrawal, Emmanouil Zaranis, André F. T. Martins, Graham Neubig

    Abstract: Despite the steady progress in machine translation evaluation, existing automatic metrics struggle to capture how well meaning is preserved beyond sentence boundaries. We posit that reliance on a single intrinsic quality score, trained to mimic human judgments, might be insufficient for evaluating translations of long, complex passages, and a more ``pragmatic'' approach that assesses how accuratel… ▽ More

    Submitted 11 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  11. arXiv:2504.04953  [pdf, other

    cs.CL cs.AI

    M-Prometheus: A Suite of Open Multilingual LLM Judges

    Authors: José Pombal, Dongkeun Yoon, Patrick Fernandes, Ian Wu, Seungone Kim, Ricardo Rei, Graham Neubig, André F. T. Martins

    Abstract: The use of language models for automatically evaluating long-form text (LLM-as-a-judge) is becoming increasingly common, yet most LLM judges are optimized exclusively for English, with strategies for enhancing their multilingual evaluation capabilities remaining largely unexplored in the current literature. This has created a disparity in the quality of automatic evaluation methods for non-English… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  12. arXiv:2504.01001  [pdf, other

    cs.CL cs.AI

    Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models

    Authors: José Pombal, Nuno M. Guerreiro, Ricardo Rei, André F. T. Martins

    Abstract: As language models improve and become capable of performing more complex tasks across modalities, evaluating them automatically becomes increasingly challenging. Developing strong and robust task-specific automatic metrics gets harder, and human-annotated test sets -- which are expensive to create -- saturate more quickly. A compelling alternative is to design reliable strategies to automate the c… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  13. arXiv:2503.10620  [pdf, other

    cs.CL

    From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM

    Authors: Kshitij Ambilduke, Ben Peters, Sonal Sannigrahi, Anil Keshwani, Tsz Kin Lam, Bruno Martins, Marcely Zanon Boito, André F. T. Martins

    Abstract: Large language models (LLMs) have shown remarkable performance and generalization capabilities across multiple languages and tasks, making them very attractive targets for multi-modality integration (e.g., images or speech). In this work, we extend an existing LLM to the speech modality via speech discretization and continued pre-training. In particular, we are interested in multilingual LLMs, suc… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  14. arXiv:2503.08327  [pdf, ps, other

    cs.CL cs.AI

    Adding Chocolate to Mint: Mitigating Metric Interference in Machine Translation

    Authors: José Pombal, Nuno M. Guerreiro, Ricardo Rei, André F. T. Martins

    Abstract: As automatic metrics become increasingly stronger and widely adopted, the risk of unintentionally "gaming the metric" during model development rises. This issue is caused by metric interference (MINT), i.e., the use of the same or related metrics for both model tuning and evaluation. MINT can misguide practitioners into being overoptimistic about the performance of their systems: as system outputs… ▽ More

    Submitted 18 June, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  15. arXiv:2502.16357  [pdf, other

    cs.CL

    LegalBench.PT: A Benchmark for Portuguese Law

    Authors: Beatriz Canaverde, Telmo Pessoa Pires, Leonor Melo Ribeiro, André F. T. Martins

    Abstract: The recent application of LLMs to the legal field has spurred the creation of benchmarks across various jurisdictions and languages. However, no benchmark has yet been specifically designed for the Portuguese legal system. In this work, we present LegalBench.PT, the first comprehensive legal benchmark covering key areas of Portuguese law. To develop LegalBench.PT, we first collect long-form questi… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  16. arXiv:2502.14773  [pdf, other

    cs.LG

    Sparse Activations as Conformal Predictors

    Authors: Margarida M. Campos, João Calém, Sophia Sklaviadis, Mário A. T. Figueiredo, André F. T. Martins

    Abstract: Conformal prediction is a distribution-free framework for uncertainty quantification that replaces point predictions with sets, offering marginal coverage guarantees (i.e., ensuring that the prediction sets contain the true label with a specified probability, in expectation). In this paper, we uncover a novel connection between conformal prediction and sparse softmax-like transformations, such as… ▽ More

    Submitted 23 February, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: Accepted at AISTATS 2025

  17. arXiv:2502.12701  [pdf, other

    cs.CL cs.AI cs.LG

    Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral

    Authors: António Farinhas, Nuno M. Guerreiro, Sweta Agrawal, Ricardo Rei, André F. T. Martins

    Abstract: Larger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designing effective deferral rules remains a challenge. In this paper, we propose a simple yet effective approach for machine translation, using existing quality estimat… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: Preprint

  18. arXiv:2502.12082  [pdf, ps, other

    cs.CL cs.LG

    AdaSplash: Adaptive Sparse Flash Attention

    Authors: Nuno Gonçalves, Marcos Treviso, André F. T. Martins

    Abstract: The computational cost of softmax-based attention in transformers limits their applicability to long-context tasks. Adaptive sparsity, of which $α$-entmax attention is an example, offers a flexible data-dependent alternative, but existing implementations are inefficient and do not leverage the sparsity to obtain runtime and memory gains. In this work, we propose AdaSplash, which combines the effic… ▽ More

    Submitted 8 June, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: Accepted as spotlight in ICML 2025

  19. arXiv:2502.10122  [pdf, other

    cs.LG

    Modern Hopfield Networks with Continuous-Time Memories

    Authors: Saul Santos, António Farinhas, Daniel C. McNamee, André F. T. Martins

    Abstract: Recent research has established a connection between modern Hopfield networks (HNs) and transformer attention heads, with guarantees of exponential storage capacity. However, these models still face challenges scaling storage efficiently. Inspired by psychological theories of continuous neural resource allocation in working memory, we propose an approach that compresses large discrete Hopfield mem… ▽ More

    Submitted 10 April, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

  20. arXiv:2501.19098  [pdf, other

    cs.CV cs.LG

    $\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation

    Authors: Saul Santos, António Farinhas, Daniel C. McNamee, André F. T. Martins

    Abstract: Current video-language models struggle with long-video understanding due to limited context lengths and reliance on sparse frame subsampling, often leading to information loss. This paper introduces $\infty$-Video, which can process arbitrarily long videos through a continuous-time long-term memory (LTM) consolidation mechanism. Our framework augments video Q-formers by allowing them to process un… ▽ More

    Submitted 19 May, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

    Comments: 17 pages, 7 figures

  21. arXiv:2412.04205  [pdf, ps, other

    cs.CL

    A Context-aware Framework for Translation-mediated Conversations

    Authors: José Pombal, Sweta Agrawal, Patrick Fernandes, Emmanouil Zaranis, André F. T. Martins

    Abstract: Automatic translation systems offer a powerful solution to bridge language barriers in scenarios where participants do not share a common language. However, these systems can introduce errors leading to misunderstandings and conversation breakdown. A key issue is that current systems fail to incorporate the rich contextual information necessary to resolve ambiguities and omitted details, resulting… ▽ More

    Submitted 29 June, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

  22. arXiv:2412.03304  [pdf, other

    cs.CL

    Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

    Authors: Shivalika Singh, Angelika Romanou, Clémentine Fourrier, David I. Adelani, Jian Gang Ngui, Daniel Vila-Suero, Peerat Limkonchotiwat, Kelly Marchisio, Wei Qi Leong, Yosephine Susanto, Raymond Ng, Shayne Longpre, Wei-Yin Ko, Sebastian Ruder, Madeline Smith, Antoine Bosselut, Alice Oh, Andre F. T. Martins, Leshem Choshen, Daphne Ippolito, Enzo Ferrante, Marzieh Fadaee, Beyza Ermis, Sara Hooker

    Abstract: Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks. These biases stem not only from differences in language but also from the cultural knowledge required to interpret questions, reducing the practical utility of translated datasets like MMLU. Furthermore, translation often introduces artefacts that can distort the meaning or clarity of… ▽ More

    Submitted 19 February, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

  23. arXiv:2411.15043  [pdf, other

    cs.CV cs.RO

    Open-Vocabulary Online Semantic Mapping for SLAM

    Authors: Tomas Berriel Martins, Martin R. Oswald, Javier Civera

    Abstract: This paper presents an Open-Vocabulary Online 3D semantic mapping pipeline, that we denote by its acronym OVO. Given a sequence of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors. These are computed from the viewpoints where they are observed by a novel CLIP merging method. Notably, our OVO has a significantly lower computational and memory footprint than… ▽ More

    Submitted 10 March, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

  24. arXiv:2411.08590  [pdf, ps, other

    cs.LG

    Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval

    Authors: Saul Santos, Vlad Niculae, Daniel McNamee, André F. T. Martins

    Abstract: Associative memory models, such as Hopfield networks and their modern variants, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers. In this work, we introduce a unified framework-Hopfield-Fenchel-Young networks-which generalizes these models to a broader family of energy functions. Our energies are formulated as the difference… ▽ More

    Submitted 19 June, 2025; v1 submitted 13 November, 2024; originally announced November 2024.

    Comments: 49 pages, 14 figures. arXiv admin note: text overlap with arXiv:2402.13725

  25. arXiv:2411.05986  [pdf, other

    cs.CL

    Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings

    Authors: Miguel Moura Ramos, Tomás Almeida, Daniel Vareta, Filipe Azevedo, Sweta Agrawal, Patrick Fernandes, André F. T. Martins

    Abstract: Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems, especially when paired with powerful reward models that accurately assess translation quality. However, most research has focused on RL methods that use sentence-level feedback, leading to inefficient learning signals due to the reward sparsity problem -- the model rece… ▽ More

    Submitted 16 April, 2025; v1 submitted 8 November, 2024; originally announced November 2024.

    Comments: 12 pages, work-in-progress

  26. arXiv:2410.16246  [pdf, other

    cs.CL

    Analyzing Context Contributions in LLM-based Machine Translation

    Authors: Emmanouil Zaranis, Nuno M. Guerreiro, André F. T. Martins

    Abstract: Large language models (LLMs) have achieved state-of-the-art performance in machine translation (MT) and demonstrated the ability to leverage in-context learning through few-shot examples. However, the mechanisms by which LLMs use different parts of the input context remain largely unexplored. In this work, we provide a comprehensive analysis of context utilization in MT, studying how LLMs use vari… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  27. arXiv:2410.10995  [pdf, ps, other

    cs.CL

    Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation

    Authors: Emmanouil Zaranis, Giuseppe Attanasio, Sweta Agrawal, André F. T. Martins

    Abstract: Quality estimation (QE)-the automatic assessment of translation quality-has recently become crucial across several stages of the translation pipeline, from data curation to training and decoding. While QE metrics have been optimized to align with human judgments, whether they encode social biases has been largely overlooked. Biased QE risks favoring certain demographic groups over others, e.g., by… ▽ More

    Submitted 2 June, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: ACL 2025

  28. arXiv:2409.16235  [pdf, other

    cs.CL

    EuroLLM: Multilingual Language Models for Europe

    Authors: Pedro Henrique Martins, Patrick Fernandes, João Alves, Nuno M. Guerreiro, Ricardo Rei, Duarte M. Alves, José Pombal, Amin Farajian, Manuel Faysse, Mateusz Klimaszewski, Pierre Colombo, Barry Haddow, José G. C. de Souza, Alexandra Birch, André F. T. Martins

    Abstract: The quality of open-weight LLMs has seen significant improvement, yet they remain predominantly focused on English. In this paper, we introduce the EuroLLM project, aimed at developing a suite of open-weight multilingual LLMs capable of understanding and generating text in all official European Union languages, as well as several additional relevant languages. We outline the progress made to date,… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  29. arXiv:2409.07131  [pdf, other

    cs.CL cs.LG stat.ML

    Reranking Laws for Language Generation: A Communication-Theoretic Perspective

    Authors: António Farinhas, Haau-Sing Li, André F. T. Martins

    Abstract: To ensure large language models (LLMs) are used safely, one must reduce their propensity to hallucinate or to generate unacceptable answers. A simple and often used strategy is to first let the LLM generate multiple hypotheses and then employ a reranker to choose the best one. In this paper, we draw a parallel between this strategy and the use of redundancy to decrease the error rate in noisy comm… ▽ More

    Submitted 10 February, 2025; v1 submitted 11 September, 2024; originally announced September 2024.

    Comments: NeurIPS 2024 (spotlight)

  30. arXiv:2408.13745  [pdf, other

    cs.CL cs.AI cs.PL

    DOCE: Finding the Sweet Spot for Execution-Based Code Generation

    Authors: Haau-Sing Li, Patrick Fernandes, Iryna Gurevych, André F. T. Martins

    Abstract: Recently, a diverse set of decoding and reranking procedures have been shown effective for LLM-based code generation. However, a comprehensive framework that links and experimentally compares these methods is missing. We address this by proposing Decoding Objectives for Code Execution, a comprehensive framework that includes candidate generation, $n$-best reranking, minimum Bayes risk (MBR) decodi… ▽ More

    Submitted 16 October, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 10 pages (32 including appendix), 5 figures, 25 tables. Prompts are provided in the GitHub repository to avoid potential text overlap with other papers

  31. arXiv:2407.05489  [pdf, other

    cs.CL

    How Effective are State Space Models for Machine Translation?

    Authors: Hugo Pitorro, Pavlo Vasylenko, Marcos Treviso, André F. T. Martins

    Abstract: Transformers are the current architecture of choice for NLP, but their attention layers do not scale well to long contexts. Recent works propose to replace attention with linear recurrent layers -- this is the case for state space models, which enjoy efficient training and inference. However, it remains unclear whether these models are competitive with transformers in machine translation (MT). In… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  32. arXiv:2407.00436  [pdf, other

    cs.CL

    A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models

    Authors: Peiqin Lin, André F. T. Martins, Hinrich Schütze

    Abstract: Recent studies have highlighted the potential of exploiting parallel corpora to enhance multilingual large language models, improving performance in both bilingual tasks, e.g., machine translation, and general-purpose tasks, e.g., text classification. Building upon these findings, our comprehensive study aims to identify the most effective strategies for leveraging parallel corpora. We investigate… ▽ More

    Submitted 8 February, 2025; v1 submitted 29 June, 2024; originally announced July 2024.

    Comments: NAACL 2025 Findings

  33. arXiv:2406.19482  [pdf, other

    cs.CL

    xTower: A Multilingual LLM for Explaining and Correcting Translation Errors

    Authors: Marcos Treviso, Nuno M. Guerreiro, Sweta Agrawal, Ricardo Rei, José Pombal, Tania Vaz, Helena Wu, Beatriz Silva, Daan van Stigt, André F. T. Martins

    Abstract: While machine translation (MT) systems are achieving increasingly strong performance on benchmarks, they often produce translations with errors and anomalies. Understanding these errors can potentially help improve the translation quality and user experience. This paper introduces xTower, an open large language model (LLM) built on top of TowerBase designed to provide free-text explanations for tr… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  34. arXiv:2406.18403  [pdf, ps, other

    cs.CL

    LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

    Authors: Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, André F. T. Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni

    Abstract: There is an increasing trend towards evaluating NLP models with LLMs instead of human judgments, raising questions about the validity of these evaluations, as well as their reproducibility in the case of proprietary models. We provide JUDGE-BENCH, an extensible collection of 20 NLP datasets with human annotations covering a broad range of evaluated properties and types of data, and comprehensively… ▽ More

    Submitted 2 June, 2025; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to the main conference of ACL 2025

  35. arXiv:2406.00049  [pdf, other

    cs.CL cs.LG

    QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation

    Authors: Gonçalo R. A. Faria, Sweta Agrawal, António Farinhas, Ricardo Rei, José G. C. de Souza, André F. T. Martins

    Abstract: An important challenge in machine translation (MT) is to generate high-quality and diverse translations. Prior work has shown that the estimated likelihood from the MT model correlates poorly with translation quality. In contrast, quality evaluation metrics (such as COMET or BLEURT) exhibit high correlations with human judgments, which has motivated their use as rerankers (such as quality-aware an… ▽ More

    Submitted 15 October, 2024; v1 submitted 28 May, 2024; originally announced June 2024.

    Comments: Accepted at NEURIPS Main 2024

  36. arXiv:2405.18348  [pdf, other

    cs.CL

    Can Automatic Metrics Assess High-Quality Translations?

    Authors: Sweta Agrawal, António Farinhas, Ricardo Rei, André F. T. Martins

    Abstract: Automatic metrics for evaluating translation quality are typically validated by measuring how well they correlate with human assessments. However, correlation methods tend to capture only the ability of metrics to differentiate between good and bad source-translation pairs, overlooking their reliability in distinguishing alternative translations for the same source. In this paper, we confirm that… ▽ More

    Submitted 10 October, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted at EMNLP Main 2024

  37. arXiv:2405.15518  [pdf, other

    cs.CV

    Feature Splatting for Better Novel View Synthesis with Low Overlap

    Authors: T. Berriel Martins, Javier Civera

    Abstract: 3D Gaussian Splatting has emerged as a very promising scene representation, achieving state-of-the-art quality in novel view synthesis significantly faster than competing alternatives. However, its use of spherical harmonics to represent scene colors limits the expressivity of 3D Gaussians and, as a consequence, the capability of the representation to generalize as we move away from the training v… ▽ More

    Submitted 30 September, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  38. arXiv:2405.05116  [pdf, other

    cs.CL

    XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples

    Authors: Peiqin Lin, André F. T. Martins, Hinrich Schütze

    Abstract: Recent studies indicate that leveraging off-the-shelf or fine-tuned retrievers, capable of retrieving relevant in-context examples tailored to the input query, enhances few-shot in-context learning of English. However, adapting these methods to other languages, especially low-resource ones, poses challenges due to the scarcity of cross-lingual retrievers and annotated data. Thus, we introduce XAMP… ▽ More

    Submitted 8 February, 2025; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: NAACL 2025 Findings

  39. arXiv:2405.01976  [pdf, other

    cs.CL cs.LG

    Conformal Prediction for Natural Language Processing: A Survey

    Authors: Margarida M. Campos, António Farinhas, Chrysoula Zerva, Mário A. T. Figueiredo, André F. T. Martins

    Abstract: The rapid proliferation of large language models and natural language processing (NLP) applications creates a crucial need for uncertainty quantification to mitigate risks such as hallucinations and to enhance decision-making reliability in critical applications. Conformal prediction is emerging as a theoretically sound and practically useful framework, combining flexibility with strong statistica… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  40. arXiv:2403.08314  [pdf, other

    cs.CL

    Is Context Helpful for Chat Translation Evaluation?

    Authors: Sweta Agrawal, Amin Farajian, Patrick Fernandes, Ricardo Rei, André F. T. Martins

    Abstract: Despite the recent success of automatic metrics for assessing translation quality, their application in evaluating the quality of machine-translated chats has been limited. Unlike more structured texts like news, chat conversations are often unstructured, short, and heavily reliant on contextual information. This poses questions about the reliability of existing sentence-level metrics in this doma… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  41. arXiv:2403.03923  [pdf, other

    cs.CL

    Did Translation Models Get More Robust Without Anyone Even Noticing?

    Authors: Ben Peters, André F. T. Martins

    Abstract: Neural machine translation (MT) models achieve strong results across a variety of settings, but it is widely believed that they are highly sensitive to "noisy" inputs, such as spelling errors, abbreviations, and other formatting issues. In this paper, we revisit this insight in light of recent multilingual MT models and large language models (LLMs) applied to machine translation. Somewhat surprisi… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  42. arXiv:2403.03883  [pdf, other

    cs.CL

    SaulLM-7B: A pioneering Large Language Model for Law

    Authors: Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, Andre F. T. Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Morgado, Michael Desa

    Abstract: In this paper, we introduce SaulLM-7B, a large language model (LLM) tailored for the legal domain. With 7 billion parameters, SaulLM-7B is the first LLM designed explicitly for legal text comprehension and generation. Leveraging the Mistral 7B architecture as its foundation, SaulLM-7B is trained on an English legal corpus of over 30 billion tokens. SaulLM-7B exhibits state-of-the-art proficiency i… ▽ More

    Submitted 7 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  43. arXiv:2402.17733  [pdf, other

    cs.CL

    Tower: An Open Multilingual Large Language Model for Translation-Related Tasks

    Authors: Duarte M. Alves, José Pombal, Nuno M. Guerreiro, Pedro H. Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G. C. de Souza, André F. T. Martins

    Abstract: While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose a recipe for tailoring LLMs to multiple tasks present in translation workflows. We perform continued pretraining on a multilingual mixture of monolingual and pa… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  44. arXiv:2402.13725  [pdf, other

    cs.LG

    Sparse and Structured Hopfield Networks

    Authors: Saul Santos, Vlad Niculae, Daniel McNamee, Andre F. T. Martins

    Abstract: Modern Hopfield networks have enjoyed recent interest due to their connection to attention in transformers. Our paper provides a unified framework for sparse Hopfield networks by establishing a link with Fenchel-Young losses. The result is a new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations. We reveal a connection between loss mar… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: 20 pages, 4 figures

  45. arXiv:2402.00786  [pdf, other

    cs.CL cs.LG

    CroissantLLM: A Truly Bilingual French-English Language Model

    Authors: Manuel Faysse, Patrick Fernandes, Nuno M. Guerreiro, António Loison, Duarte M. Alves, Caio Corro, Nicolas Boizard, João Alves, Ricardo Rei, Pedro H. Martins, Antoni Bigata Casademunt, François Yvon, André F. T. Martins, Gautier Viaud, Céline Hudelot, Pierre Colombo

    Abstract: We introduce CroissantLLM, a 1.3B language model pretrained on a set of 3T English and French tokens, to bring to the research and industrial community a high-performance, fully open-sourced bilingual model that runs swiftly on consumer-grade local hardware. To that end, we pioneer the approach of training an intrinsically bilingual model with a 1:1 English-to-French pretraining data ratio, a cust… ▽ More

    Submitted 9 April, 2025; v1 submitted 1 February, 2024; originally announced February 2024.

  46. arXiv:2402.00707  [pdf, other

    cs.CL cs.AI cs.LG

    Non-Exchangeable Conformal Language Generation with Nearest Neighbors

    Authors: Dennis Ulmer, Chrysoula Zerva, André F. T. Martins

    Abstract: Quantifying uncertainty in automatically generated text is important for letting humans check potential hallucinations and making systems more reliable. Conformal prediction is an attractive framework to provide predictions imbued with statistical guarantees, however, its application to text generation is challenging since any i.i.d. assumptions are not realistic. In this paper, we bridge this gap… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  47. arXiv:2401.13303  [pdf, other

    cs.CL

    MaLA-500: Massive Language Adaptation of Large Language Models

    Authors: Peiqin Lin, Shaoxiong Ji, Jörg Tiedemann, André F. T. Martins, Hinrich Schütze

    Abstract: Large language models (LLMs) have advanced the state of the art in natural language processing. However, their predominant design for English or a limited set of languages creates a substantial gap in their effectiveness for low-resource languages. To bridge this gap, we introduce MaLA-500, a novel large language model designed to cover an extensive range of 534 languages. To train MaLA-500, we em… ▽ More

    Submitted 3 April, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  48. arXiv:2311.09132  [pdf, other

    cs.CL

    Aligning Neural Machine Translation Models: Human Feedback in Training and Inference

    Authors: Miguel Moura Ramos, Patrick Fernandes, António Farinhas, André F. T. Martins

    Abstract: Reinforcement learning from human feedback (RLHF) is a recent technique to improve the quality of the text generated by a language model, making it closer to what humans would generate. A core ingredient in RLHF's success in aligning and improving large language models (LLMs) is its reward model, trained using human feedback on model outputs. In machine translation (MT), where metrics trained from… ▽ More

    Submitted 4 July, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: EAMT 2024

  49. Towards the automation of book typesetting

    Authors: Sérgio M. Rebelo, Tiago Martins, Diogo Ferreira, Artur Rebelo

    Abstract: This paper proposes a generative approach for the automatic typesetting of books in desktop publishing. The presented system consists in a computer script that operates inside a widely used design software tool and implements a generative process based on several typographic rules, styles and principles which have been identified in the literature. The performance of the proposed system is tested… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: 26 pages, 5 figures. Revised version published at Visual Informatics, 7(2), pp. 1\textendash{}12

    Journal ref: Visual Informatics, (2023) 7(2), 1--12

  50. arXiv:2310.13448  [pdf, other

    cs.CL

    Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning

    Authors: Duarte M. Alves, Nuno M. Guerreiro, João Alves, José Pombal, Ricardo Rei, José G. C. de Souza, Pierre Colombo, André F. T. Martins

    Abstract: Large language models (LLMs) are a promising avenue for machine translation (MT). However, current LLM-based MT systems are brittle: their effectiveness highly depends on the choice of few-shot examples and they often require extra post-processing due to overgeneration. Alternatives such as finetuning on translation instructions are computationally expensive and may weaken in-context learning capa… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023 - Findings