Skip to main content

Showing 1–50 of 102 results for author: Ribeiro, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.07671  [pdf, ps, other

    cs.CL cs.AI

    GaRAGe: A Benchmark with Grounding Annotations for RAG Evaluation

    Authors: Ionut-Teodor Sorodoc, Leonardo F. R. Ribeiro, Rexhina Blloshmi, Christopher Davis, Adrià de Gispert

    Abstract: We present GaRAGe, a large RAG benchmark with human-curated long-form answers and annotations of each grounding passage, allowing a fine-grained evaluation of whether LLMs can identify relevant grounding when generating RAG answers. Our benchmark contains 2366 questions of diverse complexity, dynamism, and topics, and includes over 35K annotated passages retrieved from both private document sets a… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: ACL 2025 (Findings)

  2. arXiv:2506.02811  [pdf, ps, other

    cs.LG

    CART-based Synthetic Tabular Data Generation for Imbalanced Regression

    Authors: António Pedro Pinheiro, Rita P. Ribeiro

    Abstract: Handling imbalanced target distributions in regression tasks remains a significant challenge in tabular data settings where underrepresented regions can hinder model performance. Among data-level solutions, some proposals, such as random sampling and SMOTE-based approaches, propose adapting classification techniques to regression tasks. However, these methods typically rely on crisp, artificial th… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 15 pages, 2 figures, 5 tables, 1 algorithm

  3. arXiv:2505.10089  [pdf, other

    cs.CL

    XRAG: Cross-lingual Retrieval-Augmented Generation

    Authors: Wei Liu, Sony Trenous, Leonardo F. R. Ribeiro, Bill Byrne, Felix Hieber

    Abstract: We propose XRAG, a novel benchmark designed to evaluate the generation abilities of LLMs in cross-lingual Retrieval-Augmented Generation (RAG) settings where the user language does not match the retrieval results. XRAG is constructed from recent news articles to ensure that its questions require external knowledge to be answered. It covers the real-world scenarios of monolingual and multilingual r… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  4. arXiv:2505.06393  [pdf, other

    cs.CV

    Toward Advancing License Plate Super-Resolution in Real-World Scenarios: A Dataset and Benchmark

    Authors: Valfride Nascimento, Gabriel E. Lima, Rafael O. Ribeiro, William Robson Schwartz, Rayson Laroca, David Menotti

    Abstract: Recent advancements in super-resolution for License Plate Recognition (LPR) have sought to address challenges posed by low-resolution (LR) and degraded images in surveillance, traffic monitoring, and forensic applications. However, existing studies have relied on private datasets and simplistic degradation models. To address this gap, we introduce UFPR-SR-Plates, a novel dataset containing 10,000… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: Accepted for publication in the Journal of the Brazilian Computer Society

  5. arXiv:2505.05949  [pdf, ps, other

    cs.CL

    NeoQA: Evidence-based Question Answering with Generated News Events

    Authors: Max Glockner, Xiang Jiang, Leonardo F. R. Ribeiro, Iryna Gurevych, Markus Dreyer

    Abstract: Evaluating Retrieval-Augmented Generation (RAG) in large language models (LLMs) is challenging because benchmarks can quickly become stale. Questions initially requiring retrieval may become answerable from pretraining knowledge as newer models incorporate more recent information during pretraining, making it difficult to distinguish evidence-based reasoning from recall. We introduce NeoQA (News E… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  6. arXiv:2504.16028  [pdf, ps, other

    eess.SY cs.MA math.OC

    Hessian Riemannian Flow For Multi-Population Wardrop Equilibrium

    Authors: Tigran Bakaryan, Christoph Aoun, Ricardo de Lima Ribeiro, Naira Hovakimyan, Diogo Gomes

    Abstract: In this paper, we address the problem of optimizing flows on generalized graphs that feature multiple entry points and multiple populations, each with varying cost structures. We tackle this problem by considering the multi-population Wardrop equilibrium, defined through variational inequalities. We rigorously analyze the existence and uniqueness of the Wardrop equilibrium. Furthermore, we introdu… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  7. arXiv:2504.14963  [pdf, ps, other

    cs.CL cs.AI cs.CE cs.LG cs.NE

    Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues

    Authors: Rui Ribeiro, Luísa Coheur, Joao P. Carvalho

    Abstract: Speaker identification using voice recordings leverages unique acoustic features, but this approach fails when only textual data is available. Few approaches have attempted to tackle the problem of identifying speakers solely from text, and the existing ones have primarily relied on traditional methods. In this work, we explore the use of fuzzy fingerprints from large pre-trained models to improve… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Paper accepted at the FUZZY IEEE 2025 conference

  8. arXiv:2502.08972  [pdf, other

    cs.CL cs.AI

    Tuning-Free Personalized Alignment via Trial-Error-Explain In-Context Learning

    Authors: Hyundong Cho, Karishma Sharma, Nicolaas Jedema, Leonardo F. R. Ribeiro, Alessandro Moschitti, Ravi Krishnan, Jonathan May

    Abstract: Language models are aligned to the collective voice of many, resulting in generic outputs that do not align with specific users' styles. In this work, we present Trial-Error-Explain In-Context Learning (TICL), a tuning-free method that personalizes language models for text generation tasks with fewer than 10 examples per user. TICL iteratively expands an in-context learning prompt via a trial-erro… ▽ More

    Submitted 5 April, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

    Comments: NAACL 2025 Findings

  9. arXiv:2501.17568  [pdf, other

    cs.LG

    Histogram Approaches for Imbalanced Data Streams Regression

    Authors: Ehsan Aminian, Rita P. Ribeiro, Joao Gama

    Abstract: Imbalanced domains pose a significant challenge in real-world predictive analytics, particularly in the context of regression. While existing research has primarily focused on batch learning from static datasets, limited attention has been given to imbalanced regression in online learning scenarios. Intending to address this gap, in prior work, we proposed sampling strategies based on Chebyshevs i… ▽ More

    Submitted 13 March, 2025; v1 submitted 29 January, 2025; originally announced January 2025.

  10. arXiv:2501.13706  [pdf

    cs.CE physics.comp-ph

    Analysis of Eccentric Coaxial Waveguides Filled with Lossy Anisotropic Media via Finite Difference

    Authors: Raul O. Ribeiro, Maria A. Martinez, Guilherme S. Rosa, Rafael A. Penchel

    Abstract: This study presents a finite difference method (FDM) to model the electromagnetic field propagation in eccentric coaxial waveguides filled with lossy uniaxially anisotropic media. The formulation utilizes conformal transformation to map the eccentric circular waveguide into an equivalent concentric one. In the concentric problem, we introduce a novel normalized Helmholtz equation to decouple TM an… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

    Comments: This work was presented at the SBMO 2024 - XXI Brazilian Symposium on Microwaves and Optoelectronics. For more information about the conference, please visit https://www.sbmo.org.br/sbmo/2024/home

  11. Higher-Order Spectral Element Methods for Electromagnetic Modeling of Complex Anisotropic Waveguides

    Authors: Raul Oliveira Ribeiro

    Abstract: This research thesis presents a novel higher-order spectral element method (SEM) formulated in cylindrical coordinates for analyzing electromagnetic fields in waveguides filled with complex anisotropic media. In this study, we consider a large class of cylindrical waveguides: radially-bounded and radially-unbounded domains; homogeneous and inhomogeneous waveguides; concentric and non-concentric ge… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

    Comments: Ph.D. Thesis in Electrical Engineering at the Pontifical Catholic University of Rio de Janeiro

    Journal ref: Maxwell, PUC-Rio, 2024

  12. arXiv:2410.05312  [pdf, other

    cs.CR cs.AI cs.ET cs.LG cs.NI

    An Intelligent Native Network Slicing Security Architecture Empowered by Federated Learning

    Authors: Rodrigo Moreira, Rodolfo S. Villaca, Moises R. N. Ribeiro, Joberto S. B. Martins, Joao Henrique Correa, Tereza C. Carvalho, Flavio de Oliveira Silva

    Abstract: Network Slicing (NS) has transformed the landscape of resource sharing in networks, offering flexibility to support services and applications with highly variable requirements in areas such as the next-generation 5G/6G mobile networks (NGMN), vehicular networks, industrial Internet of Things (IoT), and verticals. Although significant research and experimentation have driven the development of netw… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: 18 pages, 12 figures, Future Generation Computer Systems (FGCS)

    ACM Class: I.2; I.6; F.2.2

    Journal ref: Future Generation Computer Systems (FGCS); ISSN:0167-739X; 2024

  13. arXiv:2409.15515  [pdf, other

    cs.CL cs.AI

    Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA

    Authors: Nirmal Roy, Leonardo F. R. Ribeiro, Rexhina Blloshmi, Kevin Small

    Abstract: Augmenting Large Language Models (LLMs) with information retrieval capabilities (i.e., Retrieval-Augmented Generation (RAG)) has proven beneficial for knowledge-intensive tasks. However, understanding users' contextual search intent when generating responses is an understudied topic for conversational question answering (QA). This conversational extension leads to additional concerns when compared… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: Accepted in EMNLP (findings) 2024

  14. arXiv:2409.14672  [pdf, other

    cs.AI

    Speechworthy Instruction-tuned Language Models

    Authors: Hyundong Cho, Nicolaas Jedema, Leonardo F. R. Ribeiro, Karishma Sharma, Pedro Szekely, Alessandro Moschitti, Ruben Janssen, Jonathan May

    Abstract: Current instruction-tuned language models are exclusively trained with textual preference data and thus are often not aligned with the unique requirements of other modalities, such as speech. To better align language models with the speech domain, we explore (i) prompting strategies grounded in radio-industry best practices and (ii) preference learning using a novel speech-based preference data of… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

    Comments: EMNLP2024

  15. arXiv:2409.07220  [pdf, other

    cs.CV

    Watchlist Challenge: 3rd Open-set Face Detection and Identification

    Authors: Furkan Kasım, Terrance E. Boult, Rensso Mora, Bernardo Biesseck, Rafael Ribeiro, Jan Schlueter, Tomáš Repák, Rafael Henrique Vareto, David Menotti, William Robson Schwartz, Manuel Günther

    Abstract: In the current landscape of biometrics and surveillance, the ability to accurately recognize faces in uncontrolled settings is paramount. The Watchlist Challenge addresses this critical need by focusing on face detection and open-set identification in real-world surveillance scenarios. This paper presents a comprehensive evaluation of participating algorithms, using the enhanced UnConstrained Coll… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: Accepted for presentation at IJCB 2024

  16. Multi-Feature Aggregation in Diffusion Models for Enhanced Face Super-Resolution

    Authors: Marcelo dos Santos, Rayson Laroca, Rafael O. Ribeiro, João C. Neves, David Menotti

    Abstract: Super-resolution algorithms often struggle with images from surveillance environments due to adverse conditions such as unknown degradation, variations in pose, irregular illumination, and occlusions. However, acquiring multiple images, even of low quality, is possible with surveillance cameras. In this work, we develop an algorithm based on diffusion models that utilize a low-resolution image com… ▽ More

    Submitted 20 October, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted for presentation at the Conference on Graphics, Patterns and Images (SIBGRAPI) 2024

  17. Enhancing License Plate Super-Resolution: A Layout-Aware and Character-Driven Approach

    Authors: Valfride Nascimento, Rayson Laroca, Rafael O. Ribeiro, William Robson Schwartz, David Menotti

    Abstract: Despite significant advancements in License Plate Recognition (LPR) through deep learning, most improvements rely on high-resolution images with clear characters. This scenario does not reflect real-world conditions where traffic surveillance often captures low-resolution and blurry images. Under these conditions, characters tend to blend with the background or neighboring characters, making accur… ▽ More

    Submitted 20 October, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted for presentation at the Conference on Graphics, Patterns and Images (SIBGRAPI) 2024

  18. arXiv:2408.07457  [pdf, ps, other

    cs.CL

    From Brazilian Portuguese to European Portuguese

    Authors: João Sanches, Rui Ribeiro, Luísa Coheur

    Abstract: Brazilian Portuguese and European Portuguese are two varieties of the same language and, despite their close similarities, they exhibit several differences. However, there is a significant disproportion in the availability of resources between the two variants, with Brazilian Portuguese having more abundant resources. This inequity can impact the quality of translation services accessible to Europ… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 12 pages, 8 tables

    MSC Class: 68T50 ACM Class: I.2.7

  19. arXiv:2407.13945  [pdf, other

    cs.CL

    FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking

    Authors: Zhuoer Wang, Leonardo F. R. Ribeiro, Alexandros Papangelis, Rohan Mukherjee, Tzu-Yen Wang, Xinyan Zhao, Arijit Biswas, James Caverlee, Angeliki Metallinou

    Abstract: API call generation is the cornerstone of large language models' tool-using ability that provides access to the larger world. However, existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request. To address these limitations, we propose an output-side opt… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  20. arXiv:2406.03592  [pdf, other

    cs.CL cs.AI

    Measuring Retrieval Complexity in Question Answering Systems

    Authors: Matteo Gabburo, Nicolaas Paul Jedema, Siddhant Garg, Leonardo F. R. Ribeiro, Alessandro Moschitti

    Abstract: In this paper, we investigate which questions are challenging for retrieval-based Question Answering (QA). We (i) propose retrieval complexity (RC), a novel metric conditioned on the completeness of retrieved documents, which measures the difficulty of answering questions, and (ii) propose an unsupervised pipeline to measure RC given an arbitrary retrieval system. Our proposed pipeline measures RC… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 (findings)

  21. arXiv:2405.12785  [pdf, other

    cs.AI

    Artificial Intelligence Approaches for Predictive Maintenance in the Steel Industry: A Survey

    Authors: Jakub Jakubowski, Natalia Wojak-Strzelecka, Rita P. Ribeiro, Sepideh Pashami, Szymon Bobek, Joao Gama, Grzegorz J Nalepa

    Abstract: Predictive Maintenance (PdM) emerged as one of the pillars of Industry 4.0, and became crucial for enhancing operational efficiency, allowing to minimize downtime, extend lifespan of equipment, and prevent failures. A wide range of PdM tasks can be performed using Artificial Intelligence (AI) methods, which often use data generated from industrial sensors. The steel industry, which is an important… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Preprint submitted to Engineering Applications of Artificial Intelligence

  22. arXiv:2405.05809  [pdf, other

    cs.LG cs.AI cs.CY

    Aequitas Flow: Streamlining Fair ML Experimentation

    Authors: Sérgio Jesus, Pedro Saleiro, Inês Oliveira e Silva, Beatriz M. Jorge, Rita P. Ribeiro, João Gama, Pedro Bizarro, Rayid Ghani

    Abstract: Aequitas Flow is an open-source framework and toolkit for end-to-end Fair Machine Learning (ML) experimentation, and benchmarking in Python. This package fills integration gaps that exist in other fair ML packages. In addition to the existing audit capabilities in Aequitas, the Aequitas Flow module provides a pipeline for fairness-aware model training, hyperparameter optimization, and evaluation,… ▽ More

    Submitted 30 October, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  23. A Multilevel Strategy to Improve People Tracking in a Real-World Scenario

    Authors: Cristiano B. de Oliveira, Joao C. Neves, Rafael O. Ribeiro, David Menotti

    Abstract: The Palácio do Planalto, office of the President of Brazil, was invaded by protesters on January 8, 2023. Surveillance videos taken from inside the building were subsequently released by the Brazilian Supreme Court for public scrutiny. We used segments of such footage to create the UFPR-Planalto801 dataset for people tracking and re-identification in a real-world scenario. This dataset consists of… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted for presentation at the International Conference on Computer Vision Theory and Applications (VISAPP) 2024

    Journal ref: Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, 2024

  24. arXiv:2404.14455  [pdf, other

    cs.LG cs.AI

    A Neuro-Symbolic Explainer for Rare Events: A Case Study on Predictive Maintenance

    Authors: João Gama, Rita P. Ribeiro, Saulo Mastelini, Narjes Davarid, Bruno Veloso

    Abstract: Predictive Maintenance applications are increasingly complex, with interactions between many components. Black box models are popular approaches based on deep learning techniques due to their predictive accuracy. This paper proposes a neural-symbolic architecture that uses an online rule-learning algorithm to explain when the black box model predicts failures. The proposed system solves two proble… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 26 pages

  25. arXiv:2404.01790  [pdf, other

    cs.CV cs.LG

    Super-Resolution Analysis for Landfill Waste Classification

    Authors: Matias Molina, Rita P. Ribeiro, Bruno Veloso, João Gama

    Abstract: Illegal landfills are a critical issue due to their environmental, economic, and public health impacts. This study leverages aerial imagery for environmental crime monitoring. While advances in artificial intelligence and computer vision hold promise, the challenge lies in training models with high-resolution literature datasets and adapting them to open-access low-resolution images. Considering t… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: This article has been accepted by the Symposium on Intelligent Data Analysis (IDA 2024)

  26. arXiv:2404.01701  [pdf, other

    cs.CL

    On the Role of Summary Content Units in Text Summarization Evaluation

    Authors: Marcel Nawrath, Agnieszka Nowak, Tristan Ratz, Danilo C. Walenta, Juri Opitz, Leonardo F. R. Ribeiro, João Sedoc, Daniel Deutsch, Simon Mille, Yixin Liu, Lining Zhang, Sebastian Gehrmann, Saad Mahamood, Miruna Clinciu, Khyathi Chandu, Yufang Hou

    Abstract: At the heart of the Pyramid evaluation method for text summarization lie human written summary content units (SCUs). These SCUs are concise sentences that decompose a summary into small facts. Such SCUs can be used to judge the quality of a candidate summary, possibly partially automated via natural language inference (NLI) systems. Interestingly, with the aim to fully automate the Pyramid evaluat… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 10 Pages, 3 Figures, 3 Tables, camera ready version accepted at NAACL 2024

  27. Logic-based Explanations for Linear Support Vector Classifiers with Reject Option

    Authors: Francisco Mateus Rocha Filho, Thiago Alves Rocha, Reginaldo Pereira Fernandes Ribeiro, Ajalmar Rêgo da Rocha Neto

    Abstract: Support Vector Classifier (SVC) is a well-known Machine Learning (ML) model for linear classification problems. It can be used in conjunction with a reject option strategy to reject instances that are hard to correctly classify and delegate them to a specialist. This further increases the confidence of the model. Given this, obtaining an explanation of the cause of rejection is important to not bl… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 16 pages, submitted to BRACIS 2023 (Brazilian Conference on Intelligent Systems), accepted version published in Intelligent Systems, LNCS, vol 14195

    ACM Class: I.2.4; I.2.6

  28. arXiv:2402.03488  [pdf, other

    cs.LO cs.PL

    Redex -> Coq: towards a theory of decidability of Redex's reduction semantics

    Authors: Mallku Soldevila, Rodrigo Ribeiro, Beta Ziliani

    Abstract: We propose the first steps in the development of a tool to automate the translation of Redex models into a (hopefully) semantically equivalent model in Coq, and to provide tactics to help in the certification of fundamental properties of such models. The work is heavily based on a model of Redex's semantics developed by Klein et al. By means of a simple generalization of the matching problem in Re… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  29. arXiv:2401.18060  [pdf, ps, other

    math.NT cs.DM math.CO

    Rarity of the infinite chains in the tree of numerical semigroups

    Authors: Maria Bras-Amorós, Mariana Rosas Ribeiro

    Abstract: We prove that, for each fixed genus, the portion of semigroups of that genus belonging to infinite chains in the semigroup tree approaches 0 as the genus grows to infinite. This means that most numerical semigroups have a finite number of descendants in the semigroup tree. This problem has been open since 2009.

    Submitted 31 January, 2024; originally announced January 2024.

    MSC Class: 68W30; 06F05; 20M14; 05A99

  30. arXiv:2311.18734  [pdf, other

    math.PR cs.DS physics.chem-ph physics.soc-ph

    Structural results for the Tree Builder Random Walk

    Authors: Janos Engländer, Giulio Iacobelli, Gábor Pete, Rodrigo Ribeiro

    Abstract: We study the Tree Builder Random Walk: a randomly growing tree, built by a walker as she is walking around the tree. Namely, at each time $n$, she adds a leaf to her current vertex with probability $p_n \asymp n^{-γ}$, $γ\in (2/3,1]$, then moves to a uniform random neighbor on the possibly modified tree. We show that the tree process at its growth times, after a random finite number of steps, can… ▽ More

    Submitted 6 December, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Final version accepted for publication at Annals of Applied Probability

    MSC Class: 05C81; 05C80; 60F99; 60J05

  31. arXiv:2310.10623  [pdf, other

    cs.CL cs.AI cs.LG

    Generating Summaries with Controllable Readability Levels

    Authors: Leonardo F. R. Ribeiro, Mohit Bansal, Markus Dreyer

    Abstract: Readability refers to how easily a reader can understand a written text. Several factors affect the readability level, such as the complexity of the text, its subject matter, and the reader's background knowledge. Generating summaries based on different readability levels is critical for enabling knowledge consumption by diverse audiences. However, current text generation approaches lack refined c… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted as an EMNLP 2023 main paper

  32. arXiv:2310.09916  [pdf, other

    cs.RO

    Socially reactive navigation models for mobile robots in dynamic environments

    Authors: Ricarte Ribeiro, Plinio Moreno

    Abstract: The objective of this work is to expand upon previous works, considering socially acceptable behaviours within robot navigation and interaction, and allow a robot to closely approach static and dynamic individuals or groups. The space models developed in this dissertation are adaptive, that is, capable of changing over time to accommodate the changing circumstances often existent within a social e… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  33. arXiv:2309.09326  [pdf, other

    cs.LG

    Experiential-Informed Data Reconstruction for Fishery Sustainability and Policies in the Azores

    Authors: Brenda Nogueira, Gui M. Menezes, Nuno Moniz, Rita P. Ribeiro

    Abstract: Fishery analysis is critical in maintaining the long-term sustainability of species and the livelihoods of millions of people who depend on fishing for food and income. The fishing gear, or metier, is a key factor significantly impacting marine habitats, selectively targeting species and fish sizes. Analysis of commercial catches or landings by metier in fishery stock assessment and management is… ▽ More

    Submitted 13 October, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

  34. arXiv:2309.04292  [pdf, other

    cs.CL cs.AI

    Fuzzy Fingerprinting Transformer Language-Models for Emotion Recognition in Conversations

    Authors: Patrícia Pereira, Rui Ribeiro, Helena Moniz, Luisa Coheur, Joao Paulo Carvalho

    Abstract: Fuzzy Fingerprints have been successfully used as an interpretable text classification technique, but, like most other techniques, have been largely surpassed in performance by Large Pre-trained Language Models, such as BERT or RoBERTa. These models deliver state-of-the-art results in several Natural Language Processing tasks, namely Emotion Recognition in Conversations (ERC), but suffer from the… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: FUZZ-IEEE 2023

  35. Enhancing Network Slicing Architectures with Machine Learning, Security, Sustainability and Experimental Networks Integration

    Authors: Joberto S. B. Martins, Tereza C. Carvalho, Rodrigo Moreira, Cristiano Both, Adnei Donatti, João H. Corrêa, José A. Suruagy, Sand L. Corrêa, Antonio J. G. Abelem, Moisés R. N. Ribeiro, Jose-Marcos Nogueira, Luiz C. S. Magalhães, Juliano Wickboldt, Tiago Ferreto, Ricardo Mello, Rafael Pasquini, Marcos Schwarz, Leobino N. Sampaio, Daniel F. Macedo, José F. de Rezende, Kleber V. Cardoso, Flávio O. Silva

    Abstract: Network Slicing (NS) is an essential technique extensively used in 5G networks computing strategies, mobile edge computing, mobile cloud computing, and verticals like the Internet of Vehicles and industrial IoT, among others. NS is foreseen as one of the leading enablers for 6G futuristic and highly demanding applications since it allows the optimization and customization of scarce and disputed re… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: 10 pages, 11 figures

    ACM Class: I.2.1; C.2.1; C.2.3

    Journal ref: IEEE ACCESS 2023

  36. Reconstructing Spatiotemporal Data with C-VAEs

    Authors: Tiago F. R. Ribeiro, Fernando Silva, Rogério Luís de C. Costa

    Abstract: The continuous representation of spatiotemporal data commonly relies on using abstract data types, such as \textit{moving regions}, to represent entities whose shape and position continuously change over time. Creating this representation from discrete snapshots of real-world entities requires using interpolation methods to compute in-between data representations and estimate the position and shap… ▽ More

    Submitted 28 August, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: Update acknowledgments to include published article information

    Journal ref: Advances in Databases and Information Systems 13985 (2023) 59-73

  37. arXiv:2306.07688  [pdf, other

    cs.RO

    Mobility Strategy of Multi-Limbed Climbing Robots for Asteroid Exploration

    Authors: Warley F. R. Ribeiro, Kentaro Uno, Masazumi Imai, Koki Murase, Barış Can Yalçın, Matteo El Hariry, Miguel A. Olivares-Mendez, Kazuya Yoshida

    Abstract: Mobility on asteroids by multi-limbed climbing robots is expected to achieve our exploration goals in such challenging environments. We propose a mobility strategy to improve the locomotion safety of climbing robots in such harsh environments that picture extremely low gravity and highly uneven terrain. Our method plans the gait by decoupling the base and limbs' movements and adjusting the main bo… ▽ More

    Submitted 22 June, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Paper accepted for presentation at the CLAWAR 2023 (26th International Conference on Climbing and Walking Robots and the Support Technologies for Mobile Machines) (Updated references formatting)

  38. arXiv:2306.05120  [pdf, other

    cs.AI

    Explainable Predictive Maintenance

    Authors: Sepideh Pashami, Slawomir Nowaczyk, Yuantao Fan, Jakub Jakubowski, Nuno Paiva, Narjes Davari, Szymon Bobek, Samaneh Jamshidi, Hamid Sarmadi, Abdallah Alabdallah, Rita P. Ribeiro, Bruno Veloso, Moamar Sayed-Mouchaweh, Lala Rajaoarisoa, Grzegorz J. Nalepa, João Gama

    Abstract: Explainable Artificial Intelligence (XAI) fills the role of a critical interface fostering interactions between sophisticated intelligent systems and diverse individuals, including data scientists, domain experts, end-users, and more. It aids in deciphering the intricate internal mechanisms of ``black box'' Machine Learning (ML), rendering the reasons behind their decisions more understandable. Ho… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 51 pages, 9 figures

    ACM Class: I.2.1

  39. arXiv:2305.07716  [pdf, other

    cs.RO cs.AI

    Learning to Reason over Scene Graphs: A Case Study of Finetuning GPT-2 into a Robot Language Model for Grounded Task Planning

    Authors: Georgia Chalvatzaki, Ali Younes, Daljeet Nandha, An Le, Leonardo F. R. Ribeiro, Iryna Gurevych

    Abstract: Long-horizon task planning is essential for the development of intelligent assistive and service robots. In this work, we investigate the applicability of a smaller class of large language models (LLMs), specifically GPT-2, in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially. Our method grounds the input of the LLM on the domain… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: 21 pages, 6 figures

  40. Embedding Aggregation for Forensic Facial Comparison

    Authors: Rafael Oliveira Ribeiro, João C. R. Neves, Arnout C. C. Ruifrok, Flavio de Barros Vidal

    Abstract: In forensic facial comparison, questioned-source images are usually captured in uncontrolled environments, with non-uniform lighting, and from non-cooperative subjects. The poor quality of such material usually compromises their value as evidence in legal matters. On the other hand, in forensic casework, multiple images of the person of interest are usually available. In this paper, we propose to… ▽ More

    Submitted 29 April, 2023; originally announced May 2023.

    Comments: 13 pages, 8 figures, submitted to Forensic Science International

    ACM Class: I.4; I.5

  41. arXiv:2304.06634  [pdf, other

    cs.CL cs.AI cs.LG cs.SI

    PGTask: Introducing the Task of Profile Generation from Dialogues

    Authors: Rui Ribeiro, Joao P. Carvalho, Luísa Coheur

    Abstract: Recent approaches have attempted to personalize dialogue systems by leveraging profile information into models. However, this knowledge is scarce and difficult to obtain, which makes the extraction/generation of profile information from dialogues a fundamental asset. To surpass this limitation, we introduce the Profile Generation Task (PGTask). We contribute with a new dataset for this problem, co… ▽ More

    Submitted 26 August, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Accepted at SIGDIAL 2023, 4 pages, 2 figures

  42. arXiv:2303.16151  [pdf, other

    q-fin.ST cs.LG econ.EM stat.ML

    Forecasting Large Realized Covariance Matrices: The Benefits of Factor Models and Shrinkage

    Authors: Rafael Alves, Diego S. de Brito, Marcelo C. Medeiros, Ruy M. Ribeiro

    Abstract: We propose a model to forecast large realized covariance matrices of returns, applying it to the constituents of the S\&P 500 daily. To address the curse of dimensionality, we decompose the return covariance matrix using standard firm-level factors (e.g., size, value, and profitability) and use sectoral restrictions in the residual covariance matrix. This restricted model is then estimated using v… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  43. arXiv:2301.07996  [pdf, other

    cs.RO

    RAMP: Reaction-Aware Motion Planning of Multi-Legged Robots for Locomotion in Microgravity

    Authors: Warley F. R. Ribeiro, Kentaro Uno, Masazumi Imai, Koki Murase, Kazuya Yoshida

    Abstract: Robotic mobility in microgravity is necessary to expand human utilization and exploration of outer space. Bio-inspired multi-legged robots are a possible solution for safe and precise locomotion. However, a dynamic motion of a robot in microgravity can lead to failures due to gripper detachment caused by excessive motion reactions. We propose a novel Reaction-Aware Motion Planning (RAMP) to improv… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Comments: Submitted version of paper accepted for presentation at the 2023 IEEE International Conference on Robotics and Automation (ICRA)

  44. arXiv:2211.13358  [pdf, other

    cs.LG

    Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation

    Authors: Sérgio Jesus, José Pombal, Duarte Alves, André Cruz, Pedro Saleiro, Rita P. Ribeiro, João Gama, Pedro Bizarro

    Abstract: Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes domains -- has been lagging behind. To bridge this… ▽ More

    Submitted 28 November, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted at NeurIPS 2022. https://openreview.net/forum?id=UrAYT2QwOX8

  45. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  46. arXiv:2210.10695  [pdf, other

    cs.IR cs.CL

    Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking

    Authors: Tim Baumgärtner, Leonardo F. R. Ribeiro, Nils Reimers, Iryna Gurevych

    Abstract: Pairing a lexical retriever with a neural re-ranking model has set state-of-the-art performance on large-scale information retrieval datasets. This pipeline covers scenarios like question answering or navigational queries, however, for information-seeking scenarios, users often provide information on whether a document is relevant to their query in form of clicks or explicit feedback. Therefore, i… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP 2022

  47. arXiv:2210.06496  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    SUMBot: Summarizing Context in Open-Domain Dialogue Systems

    Authors: Rui Ribeiro, Luísa Coheur

    Abstract: In this paper, we investigate the problem of including relevant information as context in open-domain dialogue systems. Most models struggle to identify and incorporate important knowledge from dialogues and simply use the entire turns as context, which increases the size of the input fed to the model with unnecessary information. Additionally, due to the input size limitation of a few hundred tok… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: 4 pages, 3 figures, accepted at IberSPEECH 2022

  48. Face Super-Resolution Using Stochastic Differential Equations

    Authors: Marcelo dos Santos, Rayson Laroca, Rafael O. Ribeiro, João Neves, Hugo Proença, David Menotti

    Abstract: Diffusion models have proven effective for various applications such as images, audio and graph generation. Other important applications are image super-resolution and the solution of inverse problems. More recently, some works have used stochastic differential equations (SDEs) to generalize diffusion models to continuous time. In this work, we introduce SDEs to generate super-resolution face imag… ▽ More

    Submitted 24 September, 2022; originally announced September 2022.

    Comments: Accepted for presentation at the Conference on Graphics, Patterns and Images (SIBGRAPI) 2022

  49. arXiv:2208.09316  [pdf, other

    cs.CL

    UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA

    Authors: Rachneet Sachdeva, Haritz Puerto, Tim Baumgärtner, Sewin Tariverdian, Hao Zhang, Kexin Wang, Hossain Shaikh Saadi, Leonardo F. R. Ribeiro, Iryna Gurevych

    Abstract: Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions. However, state-of-the-art models rely on deep neural networks, which are difficult to interpret by humans. Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction and, if successful, increase their trust in… ▽ More

    Submitted 20 October, 2022; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: Accepted at AACL 2022 as Demo Paper

  50. arXiv:2207.05466  [pdf, other

    cs.LG cs.AI

    A Benchmark dataset for predictive maintenance

    Authors: Bruno Veloso, João Gama, Rita P. Ribeiro, Pedro M. Pereira

    Abstract: The paper describes the MetroPT data set, an outcome of a eXplainable Predictive Maintenance (XPM) project with an urban metro public transportation service in Porto, Portugal. The data was collected in 2022 that aimed to evaluate machine learning methods for online anomaly detection and failure prediction. By capturing several analogic sensor signals (pressure, temperature, current consumption),… ▽ More

    Submitted 18 July, 2022; v1 submitted 12 July, 2022; originally announced July 2022.