Skip to main content

Showing 1–35 of 35 results for author: Rei, R

.
  1. arXiv:2506.04079  [pdf, ps, other

    cs.CL cs.AI cs.LG

    EuroLLM-9B: Technical Report

    Authors: Pedro Henrique Martins, João Alves, Patrick Fernandes, Nuno M. Guerreiro, Ricardo Rei, Amin Farajian, Mateusz Klimaszewski, Duarte M. Alves, José Pombal, Manuel Faysse, Pierre Colombo, François Yvon, Barry Haddow, José G. C. de Souza, Alexandra Birch, André F. T. Martins

    Abstract: This report presents EuroLLM-9B, a large language model trained from scratch to support the needs of European citizens by covering all 24 official European Union languages and 11 additional languages. EuroLLM addresses the issue of European languages being underrepresented and underserved in existing open large language models. We provide a comprehensive overview of EuroLLM-9B's development, inclu… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 56 pages

  2. arXiv:2504.04953  [pdf, other

    cs.CL cs.AI

    M-Prometheus: A Suite of Open Multilingual LLM Judges

    Authors: José Pombal, Dongkeun Yoon, Patrick Fernandes, Ian Wu, Seungone Kim, Ricardo Rei, Graham Neubig, André F. T. Martins

    Abstract: The use of language models for automatically evaluating long-form text (LLM-as-a-judge) is becoming increasingly common, yet most LLM judges are optimized exclusively for English, with strategies for enhancing their multilingual evaluation capabilities remaining largely unexplored in the current literature. This has created a disparity in the quality of automatic evaluation methods for non-English… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  3. arXiv:2504.01001  [pdf, other

    cs.CL cs.AI

    Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models

    Authors: José Pombal, Nuno M. Guerreiro, Ricardo Rei, André F. T. Martins

    Abstract: As language models improve and become capable of performing more complex tasks across modalities, evaluating them automatically becomes increasingly challenging. Developing strong and robust task-specific automatic metrics gets harder, and human-annotated test sets -- which are expensive to create -- saturate more quickly. A compelling alternative is to design reliable strategies to automate the c… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  4. arXiv:2503.22973  [pdf, other

    cs.CL cs.AI cs.LG

    XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation

    Authors: Vivek Iyer, Ricardo Rei, Pinzhen Chen, Alexandra Birch

    Abstract: Cross-lingual open-ended generation -- i.e. generating responses in a desired language different from that of the user's query -- is an important yet understudied problem. We introduce XL-AlpacaEval, a new benchmark for evaluating cross-lingual generation capabilities in Large Language Models (LLMs), and propose XL-Instruct, a high-quality synthetic data generation method. Fine-tuning with just 8K… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  5. arXiv:2503.08327  [pdf, other

    cs.CL cs.AI

    Adding Chocolate to Mint: Mitigating Metric Interference in Machine Translation

    Authors: José Pombal, Nuno M. Guerreiro, Ricardo Rei, André F. T. Martins

    Abstract: As automatic metrics become increasingly stronger and widely adopted, the risk of unintentionally "gaming the metric" during model development rises. This issue is caused by metric interference (Mint), i.e., the use of the same or related metrics for both model tuning and evaluation. Mint can misguide practitioners into being overoptimistic about the performance of their systems: as system outputs… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  6. arXiv:2503.05500  [pdf, other

    cs.CL cs.AI

    EuroBERT: Scaling Multilingual Encoders for European Languages

    Authors: Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte M. Alves, André Martins, Ayoub Hammal, Caio Corro, Céline Hudelot, Emmanuel Malherbe, Etienne Malaboeuf, Fanny Jourdan, Gabriel Hautreux, João Alves, Kevin El-Haddad, Manuel Faysse, Maxime Peyrard, Nuno M. Guerreiro, Patrick Fernandes, Ricardo Rei, Pierre Colombo

    Abstract: General-purpose multilingual vector representations, used in retrieval, regression and classification, are traditionally obtained from bidirectional encoder models. Despite their wide applicability, encoders have been recently overshadowed by advances in generative decoder-only models. However, many innovations driving this progress are not inherently tied to decoders. In this paper, we revisit th… ▽ More

    Submitted 26 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

    Comments: 28 pages, 8 figures, 13 tables

  7. arXiv:2502.12701  [pdf, other

    cs.CL cs.AI cs.LG

    Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral

    Authors: António Farinhas, Nuno M. Guerreiro, Sweta Agrawal, Ricardo Rei, André F. T. Martins

    Abstract: Larger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designing effective deferral rules remains a challenge. In this paper, we propose a simple yet effective approach for machine translation, using existing quality estimat… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: Preprint

  8. arXiv:2502.12404  [pdf, other

    cs.CL

    WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects

    Authors: Daniel Deutsch, Eleftheria Briakou, Isaac Caswell, Mara Finkelstein, Rebecca Galor, Juraj Juraska, Geza Kovacs, Alison Lui, Ricardo Rei, Jason Riesa, Shruti Rijhwani, Parker Riley, Elizabeth Salesky, Firas Trabelsi, Stephanie Winkler, Biao Zhang, Markus Freitag

    Abstract: As large language models (LLM) become more and more capable in languages other than English, it is important to collect benchmark datasets in order to evaluate their multilingual performance, including on tasks like machine translation (MT). In this work, we extend the WMT24 dataset to cover 55 languages by collecting new human-written references and post-edits for 46 new languages and dialects in… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  9. arXiv:2412.15745  [pdf, other

    cs.CE

    Dynamic Learning Rate Decay for Stochastic Variational Inference

    Authors: Maximilian Dinkel, Gil Robalo Rei, Wolfgang A. Wall

    Abstract: Like many optimization algorithms, Stochastic Variational Inference (SVI) is sensitive to the choice of the learning rate. If the learning rate is too small, the optimization process may be slow, and the algorithm might get stuck in local optima. On the other hand, if the learning rate is too large, the algorithm may oscillate or diverge, failing to converge to a solution. Adaptive learning rate m… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: 15 pages, 8 figures

  10. arXiv:2410.07779  [pdf, other

    cs.CL

    Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation

    Authors: Sweta Agrawal, José G. C. de Souza, Ricardo Rei, António Farinhas, Gonçalo Faria, Patrick Fernandes, Nuno M Guerreiro, Andre Martins

    Abstract: Alignment with human preferences is an important step in developing accurate and safe large language models. This is no exception in machine translation (MT), where better handling of language nuances and context-specific variations leads to improved quality. However, preference data based on human feedback can be very expensive to obtain and curate at a large scale. Automatic metrics, on the othe… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: Accepted at EMNLP Main 2024

  11. arXiv:2409.20059  [pdf, other

    cs.CL

    Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis

    Authors: Hippolyte Gisserot-Boukhlef, Ricardo Rei, Emmanuel Malherbe, Céline Hudelot, Pierre Colombo, Nuno M. Guerreiro

    Abstract: Neural metrics for machine translation (MT) evaluation have become increasingly prominent due to their superior correlation with human judgments compared to traditional lexical metrics. Researchers have therefore utilized neural metrics through quality-informed decoding strategies, achieving better results than likelihood-based methods. With the rise of Large Language Models (LLMs), preference-bas… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

  12. arXiv:2409.16235  [pdf, other

    cs.CL

    EuroLLM: Multilingual Language Models for Europe

    Authors: Pedro Henrique Martins, Patrick Fernandes, João Alves, Nuno M. Guerreiro, Ricardo Rei, Duarte M. Alves, José Pombal, Amin Farajian, Manuel Faysse, Mateusz Klimaszewski, Pierre Colombo, Barry Haddow, José G. C. de Souza, Alexandra Birch, André F. T. Martins

    Abstract: The quality of open-weight LLMs has seen significant improvement, yet they remain predominantly focused on English. In this paper, we introduce the EuroLLM project, aimed at developing a suite of open-weight multilingual LLMs capable of understanding and generating text in all official European Union languages, as well as several additional relevant languages. We outline the progress made to date,… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  13. arXiv:2406.19482  [pdf, other

    cs.CL

    xTower: A Multilingual LLM for Explaining and Correcting Translation Errors

    Authors: Marcos Treviso, Nuno M. Guerreiro, Sweta Agrawal, Ricardo Rei, José Pombal, Tania Vaz, Helena Wu, Beatriz Silva, Daan van Stigt, André F. T. Martins

    Abstract: While machine translation (MT) systems are achieving increasingly strong performance on benchmarks, they often produce translations with errors and anomalies. Understanding these errors can potentially help improve the translation quality and user experience. This paper introduces xTower, an open large language model (LLM) built on top of TowerBase designed to provide free-text explanations for tr… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  14. arXiv:2406.00049  [pdf, other

    cs.CL cs.LG

    QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation

    Authors: Gonçalo R. A. Faria, Sweta Agrawal, António Farinhas, Ricardo Rei, José G. C. de Souza, André F. T. Martins

    Abstract: An important challenge in machine translation (MT) is to generate high-quality and diverse translations. Prior work has shown that the estimated likelihood from the MT model correlates poorly with translation quality. In contrast, quality evaluation metrics (such as COMET or BLEURT) exhibit high correlations with human judgments, which has motivated their use as rerankers (such as quality-aware an… ▽ More

    Submitted 15 October, 2024; v1 submitted 28 May, 2024; originally announced June 2024.

    Comments: Accepted at NEURIPS Main 2024

  15. arXiv:2405.18348  [pdf, other

    cs.CL

    Can Automatic Metrics Assess High-Quality Translations?

    Authors: Sweta Agrawal, António Farinhas, Ricardo Rei, André F. T. Martins

    Abstract: Automatic metrics for evaluating translation quality are typically validated by measuring how well they correlate with human assessments. However, correlation methods tend to capture only the ability of metrics to differentiate between good and bad source-translation pairs, overlooking their reliability in distinguishing alternative translations for the same source. In this paper, we confirm that… ▽ More

    Submitted 10 October, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted at EMNLP Main 2024

  16. arXiv:2403.08314  [pdf, other

    cs.CL

    Is Context Helpful for Chat Translation Evaluation?

    Authors: Sweta Agrawal, Amin Farajian, Patrick Fernandes, Ricardo Rei, André F. T. Martins

    Abstract: Despite the recent success of automatic metrics for assessing translation quality, their application in evaluating the quality of machine-translated chats has been limited. Unlike more structured texts like news, chat conversations are often unstructured, short, and heavily reliant on contextual information. This poses questions about the reliability of existing sentence-level metrics in this doma… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  17. arXiv:2402.17733  [pdf, other

    cs.CL

    Tower: An Open Multilingual Large Language Model for Translation-Related Tasks

    Authors: Duarte M. Alves, José Pombal, Nuno M. Guerreiro, Pedro H. Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G. C. de Souza, André F. T. Martins

    Abstract: While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose a recipe for tailoring LLMs to multiple tasks present in translation workflows. We perform continued pretraining on a multilingual mixture of monolingual and pa… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  18. arXiv:2402.00786  [pdf, other

    cs.CL cs.LG

    CroissantLLM: A Truly Bilingual French-English Language Model

    Authors: Manuel Faysse, Patrick Fernandes, Nuno M. Guerreiro, António Loison, Duarte M. Alves, Caio Corro, Nicolas Boizard, João Alves, Ricardo Rei, Pedro H. Martins, Antoni Bigata Casademunt, François Yvon, André F. T. Martins, Gautier Viaud, Céline Hudelot, Pierre Colombo

    Abstract: We introduce CroissantLLM, a 1.3B language model pretrained on a set of 3T English and French tokens, to bring to the research and industrial community a high-performance, fully open-sourced bilingual model that runs swiftly on consumer-grade local hardware. To that end, we pioneer the approach of training an intrinsically bilingual model with a 1:1 English-to-French pretraining data ratio, a cust… ▽ More

    Submitted 9 April, 2025; v1 submitted 1 February, 2024; originally announced February 2024.

  19. arXiv:2312.08085  [pdf, other

    cs.CE

    Solving Bayesian Inverse Problems With Expensive Likelihoods Using Constrained Gaussian Processes and Active Learning

    Authors: Maximilian Dinkel, Carolin M. Geitner, Gil Robalo Rei, Jonas Nitzler, Wolfgang A. Wall

    Abstract: Solving inverse problems using Bayesian methods can become prohibitively expensive when likelihood evaluations involve complex and large scale numerical models. A common approach to circumvent this issue is to approximate the forward model or the likelihood function with a surrogate model. But also there, due to limited computational resources, only a few training points are available in many prac… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 21 pages, 15 figures

  20. arXiv:2311.09828  [pdf, other

    cs.CL

    AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages

    Authors: Jiayi Wang, David Ifeoluwa Adelani, Sweta Agrawal, Marek Masiak, Ricardo Rei, Eleftheria Briakou, Marine Carpuat, Xuanli He, Sofia Bourhim, Andiswa Bukula, Muhidin Mohamed, Temitayo Olatoye, Tosin Adewumi, Hamam Mokayed, Christine Mwase, Wangui Kimotho, Foutse Yuehgoh, Anuoluwapo Aremu, Jessica Ojo, Shamsuddeen Hassan Muhammad, Salomey Osei, Abdul-Hakeem Omotayo, Chiamaka Chukwuneke, Perez Ogayo, Oumaima Hourrane , et al. (33 additional authors not shown)

    Abstract: Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measuring this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of eval… ▽ More

    Submitted 23 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted by NAACL 2024

  21. arXiv:2310.13448  [pdf, other

    cs.CL

    Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning

    Authors: Duarte M. Alves, Nuno M. Guerreiro, João Alves, José Pombal, Ricardo Rei, José G. C. de Souza, Pierre Colombo, André F. T. Martins

    Abstract: Large language models (LLMs) are a promising avenue for machine translation (MT). However, current LLM-based MT systems are brittle: their effectiveness highly depends on the choice of few-shot examples and they often require extra post-processing due to overgeneration. Alternatives such as finetuning on translation instructions are computationally expensive and may weaken in-context learning capa… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023 - Findings

  22. arXiv:2310.10482  [pdf, other

    cs.CL

    xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection

    Authors: Nuno M. Guerreiro, Ricardo Rei, Daan van Stigt, Luisa Coheur, Pierre Colombo, André F. T. Martins

    Abstract: Widely used learned metrics for machine translation evaluation, such as COMET and BLEURT, estimate the quality of a translation hypothesis by providing a single sentence-level score. As such, they offer little insight into translation errors (e.g., what are the errors and what is their severity). On the other hand, generative large language models (LLMs) are amplifying the adoption of more granula… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Work in progress

  23. arXiv:2309.11925  [pdf, other

    cs.CL

    Scaling up COMETKIWI: Unbabel-IST 2023 Submission for the Quality Estimation Shared Task

    Authors: Ricardo Rei, Nuno M. Guerreiro, José Pombal, Daan van Stigt, Marcos Treviso, Luisa Coheur, José G. C. de Souza, André F. T. Martins

    Abstract: We present the joint contribution of Unbabel and Instituto Superior Técnico to the WMT 2023 Shared Task on Quality Estimation (QE). Our team participated on all tasks: sentence- and word-level quality prediction (task 1) and fine-grained error span detection (task 2). For all tasks, we build on the COMETKIWI-22 model (Rei et al., 2022b). Our multilingual approaches are ranked first for all tasks,… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  24. arXiv:2305.11806  [pdf, other

    cs.CL

    The Inside Story: Towards Better Understanding of Machine Translation Neural Evaluation Metrics

    Authors: Ricardo Rei, Nuno M. Guerreiro, Marcos Treviso, Luisa Coheur, Alon Lavie, André F. T. Martins

    Abstract: Neural metrics for machine translation evaluation, such as COMET, exhibit significant improvements in their correlation with human judgments, as compared to traditional metrics based on lexical overlap, such as BLEU. Yet, neural metrics are, to a great extent, "black boxes" returning a single sentence-level score without transparency about the decision-making process. In this work, we develop and… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023

  25. arXiv:2209.06243  [pdf, other

    cs.CL cs.LG

    CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task

    Authors: Ricardo Rei, Marcos Treviso, Nuno M. Guerreiro, Chrysoula Zerva, Ana C. Farinha, Christine Maroti, José G. C. de Souza, Taisiya Glushkova, Duarte M. Alves, Alon Lavie, Luisa Coheur, André F. T. Martins

    Abstract: We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE). Our team participated on all three subtasks: (i) Sentence and Word-level Quality Prediction; (ii) Explainable QE; and (iii) Critical Error Detection. For all tasks we build on top of the COMET framework, connecting it with the predictor-estimator architecture of OpenKiwi, and equipping it w… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: WMT 2022 Quality Estimation shared task

  26. arXiv:2207.11774  [pdf, other

    cs.CL

    Towards a Sentiment-Aware Conversational Agent

    Authors: Isabel Dias, Ricardo Rei, Patrícia Pereira, Luisa Coheur

    Abstract: In this paper, we propose an end-to-end sentiment-aware conversational agent based on two models: a reply sentiment prediction model, which leverages the context of the dialogue to predict an appropriate sentiment for the agent to express in its reply; and a text generation model, which is conditioned on the predicted sentiment and the context of the dialogue, to produce a reply that is both conte… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

  27. arXiv:2205.00978  [pdf, other

    cs.CL

    Quality-Aware Decoding for Neural Machine Translation

    Authors: Patrick Fernandes, António Farinhas, Ricardo Rei, José G. C. de Souza, Perez Ogayo, Graham Neubig, André F. T. Martins

    Abstract: Despite the progress in machine translation quality estimation and evaluation in the last years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers around finding the most probable translation according to the model (MAP decoding), approximated with beam search. In this paper, we bring together these two lines of research and propose quality-aware decoding for NMT… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: NAACL2022

  28. arXiv:2204.06546  [pdf, other

    cs.CL

    Disentangling Uncertainty in Machine Translation Evaluation

    Authors: Chrysoula Zerva, Taisiya Glushkova, Ricardo Rei, André F. T. Martins

    Abstract: Trainable evaluation metrics for machine translation (MT) exhibit strong correlation with human judgements, but they are often hard to interpret and might produce unreliable scores under noisy or out-of-domain data. Recent work has attempted to mitigate this with simple uncertainty quantification techniques (Monte Carlo dropout and deep ensembles), however these techniques (as we show) are limited… ▽ More

    Submitted 29 November, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: accepted at EMNLP 2022

  29. arXiv:2203.04507  [pdf, other

    cs.CL

    Onception: Active Learning with Expert Advice for Real World Machine Translation

    Authors: Vânia Mendonça, Ricardo Rei, Luisa Coheur, Alberto Sardinha

    Abstract: Active learning can play an important role in low-resource settings (i.e., where annotated data is scarce), by selecting which instances may be more worthy to annotate. Most active learning approaches for Machine Translation assume the existence of a pool of sentences in a source language, and rely on human annotators to provide translations or post-edits, which can still be costly. In this articl… ▽ More

    Submitted 12 March, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Submitted to Computational Linguistics

  30. Uncertainty-Aware Machine Translation Evaluation

    Authors: Taisiya Glushkova, Chrysoula Zerva, Ricardo Rei, André F. T. Martins

    Abstract: Several neural-based metrics have been recently proposed to evaluate machine translation quality. However, all of them resort to point estimates, which provide limited information at segment level. This is made worse as they are trained on noisy, biased and scarce human judgements, often resulting in unreliable quality predictions. In this paper, we introduce uncertainty-aware MT evaluation and an… ▽ More

    Submitted 24 March, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Findings of EMNLP 2021 v2: corrected typos (esp. Tab 5)

  31. arXiv:2105.13385  [pdf, other

    cs.CL

    Online Learning Meets Machine Translation Evaluation: Finding the Best Systems with the Least Human Effort

    Authors: Vânia Mendonça, Ricardo Rei, Luisa Coheur, Alberto Sardinha, Ana Lúcia Santos

    Abstract: In Machine Translation, assessing the quality of a large amount of automatic translations can be challenging. Automatic metrics are not reliable when it comes to high performing systems. In addition, resorting to human evaluators can be expensive, especially when evaluating multiple systems. To overcome the latter challenge, we propose a novel application of online learning that, given an ensemble… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Comments: Accepted to ACL-IJCNLP 2021 Main Conference (long paper)

  32. arXiv:2102.00461  [pdf, other

    cs.CL stat.AP stat.ML

    Multilingual Email Zoning

    Authors: Bruno Jardim, Ricardo Rei, Mariana S. C. Almeida

    Abstract: The segmentation of emails into functional zones (also dubbed email zoning) is a relevant preprocessing step for most NLP tasks that deal with emails. However, despite the multilingual character of emails and their applications, previous literature regarding email zoning corpora and systems was developed essentially for English. In this paper, we analyse the existing email zoning corpora and pro… ▽ More

    Submitted 13 February, 2021; v1 submitted 31 January, 2021; originally announced February 2021.

    Comments: Accepted at EACL 2021 SRW (https://sites.google.com/view/eaclsrw2021/home); 6 pages with 2 Figures and 8 Tables, plus references; Cleverly Multilingual Zoning Corpus available at https://github.com/cleverly-ai/multilingual-email-zoning

  33. arXiv:2010.15535  [pdf, ps, other

    cs.CL

    Unbabel's Participation in the WMT20 Metrics Shared Task

    Authors: Ricardo Rei, Craig Stewart, Catarina Farinha, Alon Lavie

    Abstract: We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics. We intend to participate on the segment-level, document-level and system-level tracks on all language pairs, as well as the 'QE as a Metric' track. Accordingly, we illustrate results of our models in these tracks with reference to test sets from the previous year. Our submissions build upon the recently propose… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

    Comments: WMT Metrics Shared Task 2020

  34. arXiv:2009.09025  [pdf, other

    cs.CL

    COMET: A Neural Framework for MT Evaluation

    Authors: Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie

    Abstract: We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements. Our framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a ta… ▽ More

    Submitted 19 October, 2020; v1 submitted 18 September, 2020; originally announced September 2020.

    Comments: EMNLP 2020

  35. arXiv:2009.01657  [pdf, other

    eess.IV cs.LG

    A free web service for fast COVID-19 classification of chest X-Ray images

    Authors: Jose David Bermudez Castro, Ricardo Rei, Jose E. Ruiz, Pedro Achanccaray Diaz, Smith Arauco Canchumuni, Cristian Muñoz Villalobos, Felipe Borges Coelho, Leonardo Forero Mendoza, Marco Aurelio C. Pacheco

    Abstract: The coronavirus outbreak became a major concern for society worldwide. Technological innovation and ingenuity are essential to fight COVID-19 pandemic and bring us one step closer to overcome it. Researchers over the world are working actively to find available alternatives in different fields, such as the Healthcare System, pharmaceutic, health prevention, among others. With the rise of artificia… ▽ More

    Submitted 27 August, 2020; originally announced September 2020.

    Comments: 14 pages, 12 figures