Search | arXiv e-print repository

A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic

Authors: Juan Moreno Gonzalez, Bashar Alhafni, Nizar Habash

Abstract: Judeo-Arabic refers to Arabic variants historically spoken by Jewish communities across the Arab world, primarily during the Middle Ages. Unlike standard Arabic, it is written in Hebrew script by Jewish writers and for Jewish audiences. Transliterating Judeo-Arabic into Arabic script is challenging due to ambiguous letter mappings, inconsistent orthographic conventions, and frequent code-switching… ▽ More Judeo-Arabic refers to Arabic variants historically spoken by Jewish communities across the Arab world, primarily during the Middle Ages. Unlike standard Arabic, it is written in Hebrew script by Jewish writers and for Jewish audiences. Transliterating Judeo-Arabic into Arabic script is challenging due to ambiguous letter mappings, inconsistent orthographic conventions, and frequent code-switching into Hebrew and Aramaic. In this paper, we introduce a two-step approach to automatically transliterate Judeo-Arabic into Arabic script: simple character-level mapping followed by post-correction to address grammatical and orthographic errors. We also present the first benchmark evaluation of LLMs on this task. Finally, we show that transliteration enables Arabic NLP tools to perform morphosyntactic tagging and machine translation, which would have not been feasible on the original texts. △ Less

Submitted 7 July, 2025; originally announced July 2025.

arXiv:2507.00999 [pdf, ps, other]

La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America

Authors: María Grandury, Javier Aula-Blasco, Júlia Falcão, Clémentine Fourrier, Miguel González, Gonzalo Martínez, Gonzalo Santamaría, Rodrigo Agerri, Nuria Aldama, Luis Chiruzzo, Javier Conde, Helena Gómez, Marta Guerrero, Guido Ivetta, Natalia López, Flor Miriam Plaza-del-Arco, María Teresa Martín-Valdivia, Helena Montoro, Carmen Muñoz, Pedro Reviriego, Leire Rosado, Alejandro Vaca, María Estrella Vallecillo-Rodríguez, Jorge Vallego, Irune Zubiaga

Abstract: Leaderboards showcase the current capabilities and limitations of Large Language Models (LLMs). To motivate the development of LLMs that represent the linguistic and cultural diversity of the Spanish-speaking community, we present La Leaderboard, the first open-source leaderboard to evaluate generative LLMs in languages and language varieties of Spain and Latin America. La Leaderboard is a communi… ▽ More Leaderboards showcase the current capabilities and limitations of Large Language Models (LLMs). To motivate the development of LLMs that represent the linguistic and cultural diversity of the Spanish-speaking community, we present La Leaderboard, the first open-source leaderboard to evaluate generative LLMs in languages and language varieties of Spain and Latin America. La Leaderboard is a community-driven project that aims to establish an evaluation standard for everyone interested in developing LLMs for the Spanish-speaking community. This initial version combines 66 datasets in Basque, Catalan, Galician, and different Spanish varieties, showcasing the evaluation results of 50 models. To encourage community-driven development of leaderboards in other languages, we explain our methodology, including guidance on selecting the most suitable evaluation setup for each downstream task. In particular, we provide a rationale for using fewer few-shot examples than typically found in the literature, aiming to reduce environmental impact and facilitate access to reproducible results for a broader research community. △ Less

Submitted 1 July, 2025; originally announced July 2025.

Comments: Accepted at ACL 2025 Main

arXiv:2506.22439 [pdf, ps, other]

Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans

Authors: Javier Conde, Miguel González, María Grandury, Gonzalo Martínez, Pedro Reviriego, Mar Brysbaert

Abstract: The evaluation of LLMs has so far focused primarily on how well they can perform different tasks such as reasoning, question-answering, paraphrasing, or translating. For most of these tasks, performance can be measured with objective metrics, such as the number of correct answers. However, other language features are not easily quantified. For example, arousal, concreteness, or gender associated w… ▽ More The evaluation of LLMs has so far focused primarily on how well they can perform different tasks such as reasoning, question-answering, paraphrasing, or translating. For most of these tasks, performance can be measured with objective metrics, such as the number of correct answers. However, other language features are not easily quantified. For example, arousal, concreteness, or gender associated with a given word, as well as the extent to which we experience words with senses and relate them to a specific sense. Those features have been studied for many years by psycholinguistics, conducting large-scale experiments with humans to produce ratings for thousands of words. This opens an opportunity to evaluate how well LLMs align with human ratings on these word features, taking advantage of existing studies that cover many different language features in a large number of words. In this paper, we evaluate the alignment of a representative group of LLMs with human ratings on two psycholinguistic datasets: the Glasgow and Lancaster norms. These datasets cover thirteen features over thousands of words. The results show that alignment is \textcolor{black}{generally} better in the Glasgow norms evaluated (arousal, valence, dominance, concreteness, imageability, familiarity, and gender) than on the Lancaster norms evaluated (introceptive, gustatory, olfactory, haptic, auditory, and visual). This suggests a potential limitation of current LLMs in aligning with human sensory associations for words, which may be due to their lack of embodied cognition present in humans and illustrates the usefulness of evaluating LLMs with psycholinguistic datasets. △ Less

Submitted 29 May, 2025; originally announced June 2025.

Comments: Accepted for the GEM2 workshop at ACL 2025

arXiv:2506.17989 [pdf, ps, other]

Data Curation Matters: Model Collapse and Spurious Shift Performance Prediction from Training on Uncurated Text Embeddings

Authors: Lucas Mattioli, Youness Ait Hadichou, Sabrina Chaouche, Martin Gonzalez

Abstract: Training models on uncurated Text Embeddings (TEs) derived from raw tabular data can lead to a severe failure mode known as model collapse, where predictions converge to a single class regardless of input. By comparing models trained with identical hyper-parameter configurations on both raw tabular data and their TE-derived counterparts, we find that collapse is a consistent failure mode in the la… ▽ More Training models on uncurated Text Embeddings (TEs) derived from raw tabular data can lead to a severe failure mode known as model collapse, where predictions converge to a single class regardless of input. By comparing models trained with identical hyper-parameter configurations on both raw tabular data and their TE-derived counterparts, we find that collapse is a consistent failure mode in the latter setting. We introduce a set of metrics that capture the extent of model collapse, offering a new perspective on TE quality as a proxy for data curation. Our results reveal that TE alone does not effectively function as a curation layer - and that their quality significantly influences downstream learning. More insidiously, we observe that the presence of model collapse can yield artificially inflated and spurious Accuracy-on-the-Line correlation. These findings highlight the need for more nuanced curation and evaluation of embedding-based representations, particularly in out-of-distribution settings. △ Less

Submitted 22 June, 2025; originally announced June 2025.

Comments: 37 pages. Multiple figures

arXiv:2505.24802 [pdf, ps, other]

ByzFL: Research Framework for Robust Federated Learning

Authors: Marc González, Rachid Guerraoui, Rafael Pinot, Geovani Rizk, John Stephan, François Taïani

Abstract: We present ByzFL, an open-source Python library for developing and benchmarking robust federated learning (FL) algorithms. ByzFL provides a unified and extensible framework that includes implementations of state-of-the-art robust aggregators, a suite of configurable attacks, and tools for simulating a variety of FL scenarios, including heterogeneous data distributions, multiple training algorithms… ▽ More We present ByzFL, an open-source Python library for developing and benchmarking robust federated learning (FL) algorithms. ByzFL provides a unified and extensible framework that includes implementations of state-of-the-art robust aggregators, a suite of configurable attacks, and tools for simulating a variety of FL scenarios, including heterogeneous data distributions, multiple training algorithms, and adversarial threat models. The library enables systematic experimentation via a single JSON-based configuration file and includes built-in utilities for result visualization. Compatible with PyTorch tensors and NumPy arrays, ByzFL is designed to facilitate reproducible research and rapid prototyping of robust FL solutions. ByzFL is available at https://byzfl.epfl.ch/, with source code hosted on GitHub: https://github.com/LPD-EPFL/byzfl. △ Less

Submitted 30 May, 2025; originally announced May 2025.

arXiv:2505.18978 [pdf, other]

AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models

Authors: Miguel Angel Peñaloza Perez, Bruno Lopez Orozco, Jesus Tadeo Cruz Soto, Michelle Bruno Hernandez, Miguel Angel Alvarado Gonzalez, Sandra Malagon

Abstract: Existing mathematical reasoning benchmarks are predominantly English only or translation-based, which can introduce semantic drift and mask languagespecific reasoning errors. To address this, we present AI4Math, a benchmark of 105 original university level math problems natively authored in Spanish. The dataset spans seven advanced domains (Algebra, Calculus, Geometry, Probability, Number Theory,… ▽ More Existing mathematical reasoning benchmarks are predominantly English only or translation-based, which can introduce semantic drift and mask languagespecific reasoning errors. To address this, we present AI4Math, a benchmark of 105 original university level math problems natively authored in Spanish. The dataset spans seven advanced domains (Algebra, Calculus, Geometry, Probability, Number Theory, Combinatorics, and Logic), and each problem is accompanied by a step by step human solution. We evaluate six large language models GPT 4o, GPT 4o mini, o3 mini, LLaMA 3.3 70B, DeepSeek R1 685B, and DeepSeek V3 685B under four configurations: zero shot and chain of thought, each in Spanish and English. The top models (o3 mini, DeepSeek R1 685B, DeepSeek V3 685B) achieve over 70% accuracy, whereas LLaMA 3.3 70B and GPT-4o mini remain below 40%. Most models show no significant performance drop between languages, with GPT 4o even performing better on Spanish problems in the zero shot setting. Geometry, Combinatorics, and Probability questions remain persistently challenging for all models. These results highlight the need for native-language benchmarks and domain-specific evaluations to reveal reasoning failures not captured by standard metrics. △ Less

Submitted 25 May, 2025; originally announced May 2025.

Comments: 36 pages, 5 figures

MSC Class: 68 ACM Class: I.2

arXiv:2505.15988 [pdf, ps, other]

An Ecosystem of Services for FAIR Computational Workflows

Authors: Sean R. Wilkinson, Johan Gustafsson, Finn Bacall, Khalid Belhajjame, Salvador Capella, Jose Maria Fernandez Gonzalez, Jacob Fosso Tande, Luiz Gadelha, Daniel Garijo, Patricia Grubel, Bjorn Grüning, Farah Zaib Khan, Sehrish Kanwal, Simone Leo, Stuart Owen, Luca Pireddu, Line Pouchard, Laura Rodríguez-Navas, Beatriz Serrano-Solano, Stian Soiland-Reyes, Baiba Vilne, Alan Williams, Merridee Ann Wouters, Frederik Coppens, Carole Goble

Abstract: Computational workflows, regardless of their portability or maturity, represent major investments of both effort and expertise. They are first class, publishable research objects in their own right. They are key to sharing methodological know-how for reuse, reproducibility, and transparency. Consequently, the application of the FAIR principles to workflows is inevitable to enable them to be Findab… ▽ More Computational workflows, regardless of their portability or maturity, represent major investments of both effort and expertise. They are first class, publishable research objects in their own right. They are key to sharing methodological know-how for reuse, reproducibility, and transparency. Consequently, the application of the FAIR principles to workflows is inevitable to enable them to be Findable, Accessible, Interoperable, and Reusable. Making workflows FAIR would reduce duplication of effort, assist in the reuse of best practice approaches and community-supported standards, and ensure that workflows as digital objects can support reproducible and robust science. FAIR workflows also encourage interdisciplinary collaboration, enabling workflows developed in one field to be repurposed and adapted for use in other research domains. FAIR workflows draw from both FAIR data and software principles. Workflows propose explicit method abstractions and tight bindings to data, hence making many of the data principles apply. Meanwhile, as executable pipelines with a strong emphasis on code composition and data flow between steps, the software principles apply, too. As workflows are chiefly concerned with the processing and creation of data, they also have an important role to play in ensuring and supporting data FAIRification. The FAIR Principles for software and data mandate the use of persistent identifiers (PID) and machine actionable metadata associated with workflows to enable findability, reusability, interoperability and reusability. To implement the principles requires a PID and metadata framework with appropriate programmatic protocols, an accompanying ecosystem of services, tools, guidelines, policies, and best practices, as well the buy-in of existing workflow systems such that they adapt in order to adopt. The European EOSC-Life Workflow Collaboratory is an example of such a ... △ Less

Submitted 21 May, 2025; originally announced May 2025.

Comments: 41 pages, 4 figures, 3 tables; to appear as chapter in upcoming book

arXiv:2505.12145 [pdf, ps, other]

Trajectory-Integrated Accessibility Analysis of Public Electric Vehicle Charging Stations

Authors: Yi Ju, Jiaman Wu, Zhihan Su, Lunlong Li, Jinhua Zhao, Marta C. González, Scott J. Moura

Abstract: Electric vehicle (EV) charging infrastructure is crucial for advancing EV adoption, managing charging loads, and ensuring equitable transportation electrification. However, there remains a notable gap in comprehensive accessibility metrics that integrate the mobility of the users. This study introduces a novel accessibility metric, termed Trajectory-Integrated Public EVCS Accessibility (TI-acs), a… ▽ More Electric vehicle (EV) charging infrastructure is crucial for advancing EV adoption, managing charging loads, and ensuring equitable transportation electrification. However, there remains a notable gap in comprehensive accessibility metrics that integrate the mobility of the users. This study introduces a novel accessibility metric, termed Trajectory-Integrated Public EVCS Accessibility (TI-acs), and uses it to assess public electric vehicle charging station (EVCS) accessibility for approximately 6 million residents in the San Francisco Bay Area based on detailed individual trajectory data in one week. Unlike conventional home-based metrics, TI-acs incorporates the accessibility of EVCS along individuals' travel trajectories, bringing insights on more public charging contexts, including public charging near workplaces and charging during grid off-peak periods. As of June 2024, given the current public EVCS network, Bay Area residents have, on average, 7.5 hours and 5.2 hours of access per day during which their stay locations are within 1 km (i.e. 10-12 min walking) of a public L2 and DCFC charging port, respectively. Over the past decade, TI-acs has steadily increased from the rapid expansion of the EV market and charging infrastructure. However, spatial disparities remain significant, as reflected in Gini indices of 0.38 (L2) and 0.44 (DCFC) across census tracts. Additionally, our analysis reveals racial disparities in TI-acs, driven not only by variations in charging infrastructure near residential areas but also by differences in their mobility patterns. △ Less

Submitted 17 May, 2025; originally announced May 2025.

Comments: 19 pages, 8 figures

arXiv:2505.10862 [pdf, ps, other]

Have Multimodal Large Language Models (MLLMs) Really Learned to Tell the Time on Analog Clocks?

Authors: Tairan Fu, Miguel González, Javier Conde, Elena Merino-Gómez, Pedro Reviriego

Abstract: Multimodal Large Language Models which can answer complex questions on an image struggle to tell the time on analog clocks. This is probably due to the lack of images with clocks at different times in their training set. In this work we explore this issue with one of the latest MLLMs: GPT-4.1 to understand why MLLMs fail to tell the time and whether fine-tuning can solve the problem. The results s… ▽ More Multimodal Large Language Models which can answer complex questions on an image struggle to tell the time on analog clocks. This is probably due to the lack of images with clocks at different times in their training set. In this work we explore this issue with one of the latest MLLMs: GPT-4.1 to understand why MLLMs fail to tell the time and whether fine-tuning can solve the problem. The results show how models are making progress in reading the time on analog clocks. But have they really learned to do it, or have they only learned patterns in their training datasets? In this work we put the models to the test with different clocks to illustrate the limitations of MLLMs to abstract and generalize. △ Less

Submitted 16 May, 2025; originally announced May 2025.

Comments: 6 pages, 5 figures, 2 tables

ACM Class: I.2.7

arXiv:2505.09319 [pdf, other]

Statistical Modeling and Uncertainty Estimation of LLM Inference Systems

Authors: Kaustabha Ray, Nelson Mimura Gonzalez, Bruno Wassermann, Rachel Tzoref-Brill, Dean H. Lorenz

Abstract: Large Language Model (LLM) inference systems present significant challenges in statistical performance characterization due to dynamic workload variations, diverse hardware architectures, and complex interactions between model size, batch processing, and throughput requirements. Accurate statistical characterization enables better workload scheduling, adaptive resource provisioning, and cost-aware… ▽ More Large Language Model (LLM) inference systems present significant challenges in statistical performance characterization due to dynamic workload variations, diverse hardware architectures, and complex interactions between model size, batch processing, and throughput requirements. Accurate statistical characterization enables better workload scheduling, adaptive resource provisioning, and cost-aware inference optimization, making it crucial for improving efficiency in large-scale AI deployments. Traditional analytical models provide explainability but cannot cover the vast diversity of real-world workloads, making it impossible to benchmark every scenario in advance. Machine learning (ML) approaches effectively predict performance for non-benchmarked cases but struggle when extrapolating beyond their observed training space. To address these limitations for LLM inference systems, we propose an Analytical with Learning Augmentation (ALA) framework that bridges analytical modeling with \ml for robust statistical prediction and uncertainty estimation in LLM inference workloads. Our method employs an analytical throughput model with parameters estimated for benchmarked workloads, then extends to unobserved configurations using \ml predictions. We enhance this with simulated annealing to exploit subsets of the workload data point combinations and develop an error predictor. Finally, we quantify uncertainty based on vector space similarity between new and observed workloads to ensure robust generalization. Through extensive experimentation on diverse LLM inference workloads, we demonstrate that our framework achieves low median errors while maintaining adaptability to new inference scenarios. △ Less

Submitted 14 May, 2025; originally announced May 2025.

arXiv:2505.05331 [pdf, ps, other]

Aesthetics Without Semantics

Authors: C. Alejandro Parraga, Olivier Penacchio, Marcos Muňoz Gonzalez, Bogdan Raducanu, Xavier Otazu

Abstract: While it is easy for human observers to judge an image as beautiful or ugly, aesthetic decisions result from a combination of entangled perceptual and cognitive (semantic) factors, making the understanding of aesthetic judgements particularly challenging from a scientific point of view. Furthermore, our research shows a prevailing bias in current databases, which include mostly beautiful images, f… ▽ More While it is easy for human observers to judge an image as beautiful or ugly, aesthetic decisions result from a combination of entangled perceptual and cognitive (semantic) factors, making the understanding of aesthetic judgements particularly challenging from a scientific point of view. Furthermore, our research shows a prevailing bias in current databases, which include mostly beautiful images, further complicating the study and prediction of aesthetic responses. We address these limitations by creating a database of images with minimal semantic content and devising, and next exploiting, a method to generate images on the ugly side of aesthetic valuations. The resulting Minimum Semantic Content (MSC) database consists of a large and balanced collection of 10,426 images, each evaluated by 100 observers. We next use established image metrics to demonstrate how augmenting an image set biased towards beautiful images with ugly images can modify, or even invert, an observed relationship between image features and aesthetics valuation. Taken together, our study reveals that works in empirical aesthetics attempting to link image content and aesthetic judgements may magnify, underestimate, or simply miss interesting effects due to a limitation of the range of aesthetic values they consider. △ Less

Submitted 12 June, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

Comments: Parts of this work were presented in abstract format at the Vision Science of Art Conference (VSAC2016), the Iberian Conference on Perception (CIP2022), and the European Conference on Visual Perception (ECVP2022). See Perception 51, No1 (Suppl.) pp139, 2022)

arXiv:2504.16609 [pdf, other]

Information Leakage of Sentence Embeddings via Generative Embedding Inversion Attacks

Authors: Antonios Tragoudaras, Theofanis Aslanidis, Emmanouil Georgios Lionis, Marina Orozco González, Panagiotis Eustratiadis

Abstract: Text data are often encoded as dense vectors, known as embeddings, which capture semantic, syntactic, contextual, and domain-specific information. These embeddings, widely adopted in various applications, inherently contain rich information that may be susceptible to leakage under certain attacks. The GEIA framework highlights vulnerabilities in sentence embeddings, demonstrating that they can rev… ▽ More Text data are often encoded as dense vectors, known as embeddings, which capture semantic, syntactic, contextual, and domain-specific information. These embeddings, widely adopted in various applications, inherently contain rich information that may be susceptible to leakage under certain attacks. The GEIA framework highlights vulnerabilities in sentence embeddings, demonstrating that they can reveal the original sentences they represent. In this study, we reproduce GEIA's findings across various neural sentence embedding models. Additionally, we contribute new analysis to examine whether these models leak sensitive information from their training datasets. We propose a simple yet effective method without any modification to the attacker's architecture proposed in GEIA. The key idea is to examine differences between log-likelihood for masked and original variants of data that sentence embedding models have been pre-trained on, calculated on the embedding space of the attacker. Our findings indicate that following our approach, an adversary party can recover meaningful sensitive information related to the pre-training knowledge of the popular models used for creating sentence embeddings, seriously undermining their security. Our code is available on: https://github.com/taslanidis/GEIA △ Less

Submitted 23 April, 2025; originally announced April 2025.

Comments: This is a preprint of our paper accepted at SIGIR 2025

arXiv:2504.01208 [pdf, other]

Lightweight Deep Models for Dermatological Disease Detection: A Study on Instance Selection and Channel Optimization

Authors: Ian Mateos Gonzalez, Estefani Jaramilla Nava, Abraham Sánchez Morales, Jesús García-Ramírez, Ricardo Ramos-Aguilar

Abstract: The identification of dermatological disease is an important problem in Mexico according with different studies. Several works in literature use the datasets of different repositories without applying a study of the data behavior, especially in medical images domain. In this work, we propose a methodology to preprocess dermaMNIST dataset in order to improve its quality for the classification stage… ▽ More The identification of dermatological disease is an important problem in Mexico according with different studies. Several works in literature use the datasets of different repositories without applying a study of the data behavior, especially in medical images domain. In this work, we propose a methodology to preprocess dermaMNIST dataset in order to improve its quality for the classification stage, where we use lightweight convolutional neural networks. In our results, we reduce the number of instances for the neural network training obtaining a similar performance of models as ResNet. △ Less

Submitted 1 April, 2025; originally announced April 2025.

Comments: Submitted to Mexican Conference on Pattern Recognition 2025

arXiv:2504.00979 [pdf]

Artificial Intelligence-Assisted Prostate Cancer Diagnosis for Reduced Use of Immunohistochemistry

Authors: Anders Blilie, Nita Mulliqi, Xiaoyi Ji, Kelvin Szolnoky, Sol Erika Boman, Matteo Titus, Geraldine Martinez Gonzalez, José Asenjo, Marcello Gambacorta, Paolo Libretti, Einar Gudlaugsson, Svein R. Kjosavik, Lars Egevad, Emiel A. M. Janssen, Martin Eklund, Kimmo Kartasalo

Abstract: Prostate cancer diagnosis heavily relies on histopathological evaluation, which is subject to variability. While immunohistochemical staining (IHC) assists in distinguishing benign from malignant tissue, it involves increased work, higher costs, and diagnostic delays. Artificial intelligence (AI) presents a promising solution to reduce reliance on IHC by accurately classifying atypical glands and… ▽ More Prostate cancer diagnosis heavily relies on histopathological evaluation, which is subject to variability. While immunohistochemical staining (IHC) assists in distinguishing benign from malignant tissue, it involves increased work, higher costs, and diagnostic delays. Artificial intelligence (AI) presents a promising solution to reduce reliance on IHC by accurately classifying atypical glands and borderline morphologies in hematoxylin & eosin (H&E) stained tissue sections. In this study, we evaluated an AI model's ability to minimize IHC use without compromising diagnostic accuracy by retrospectively analyzing prostate core needle biopsies from routine diagnostics at three different pathology sites. These cohorts were composed exclusively of difficult cases where the diagnosing pathologists required IHC to finalize the diagnosis. The AI model demonstrated area under the curve values of 0.951-0.993 for detecting cancer in routine H&E-stained slides. Applying sensitivity-prioritized diagnostic thresholds reduced the need for IHC staining by 44.4%, 42.0%, and 20.7% in the three cohorts investigated, without a single false negative prediction. This AI model shows potential for optimizing IHC use, streamlining decision-making in prostate pathology, and alleviating resource burdens. △ Less

Submitted 31 March, 2025; originally announced April 2025.

Comments: 29 pages, 5 figures and 3 tables

arXiv:2502.21264 [pdf]

Foundation Models -- A Panacea for Artificial Intelligence in Pathology?

Authors: Nita Mulliqi, Anders Blilie, Xiaoyi Ji, Kelvin Szolnoky, Henrik Olsson, Sol Erika Boman, Matteo Titus, Geraldine Martinez Gonzalez, Julia Anna Mielcarz, Masi Valkonen, Einar Gudlaugsson, Svein R. Kjosavik, José Asenjo, Marcello Gambacorta, Paolo Libretti, Marcin Braun, Radzislaw Kordek, Roman Łowicki, Kristina Hotakainen, Päivi Väre, Bodil Ginnerup Pedersen, Karina Dalsgaard Sørensen, Benedicte Parm Ulhøi, Pekka Ruusuvuori, Brett Delahunt , et al. (6 additional authors not shown)

Abstract: The role of artificial intelligence (AI) in pathology has evolved from aiding diagnostics to uncovering predictive morphological patterns in whole slide images (WSIs). Recently, foundation models (FMs) leveraging self-supervised pre-training have been widely advocated as a universal solution for diverse downstream tasks. However, open questions remain about their clinical applicability and general… ▽ More The role of artificial intelligence (AI) in pathology has evolved from aiding diagnostics to uncovering predictive morphological patterns in whole slide images (WSIs). Recently, foundation models (FMs) leveraging self-supervised pre-training have been widely advocated as a universal solution for diverse downstream tasks. However, open questions remain about their clinical applicability and generalization advantages over end-to-end learning using task-specific (TS) models. Here, we focused on AI with clinical-grade performance for prostate cancer diagnosis and Gleason grading. We present the largest validation of AI for this task, using over 100,000 core needle biopsies from 7,342 patients across 15 sites in 11 countries. We compared two FMs with a fully end-to-end TS model in a multiple instance learning framework. Our findings challenge assumptions that FMs universally outperform TS models. While FMs demonstrated utility in data-scarce scenarios, their performance converged with - and was in some cases surpassed by - TS models when sufficient labeled training data were available. Notably, extensive task-specific training markedly reduced clinically significant misgrading, misdiagnosis of challenging morphologies, and variability across different WSI scanners. Additionally, FMs used up to 35 times more energy than the TS model, raising concerns about their sustainability. Our results underscore that while FMs offer clear advantages for rapid prototyping and research, their role as a universal solution for clinically applicable medical AI remains uncertain. For high-stakes clinical applications, rigorous validation and consideration of task-specific training remain critically important. We advocate for integrating the strengths of FMs and end-to-end learning to achieve robust and resource-efficient AI pathology solutions fit for clinical use. △ Less

Submitted 3 March, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

Comments: 50 pages, 15 figures and an appendix (study protocol) which is previously published, see https://doi.org/10.1101/2024.07.04.24309948; updated authors list format

arXiv:2502.16721 [pdf]

doi 10.1109/MC.2024.3399384

Speed and Conversational Large Language Models: Not All Is About Tokens per Second

Authors: Javier Conde, Miguel González, Pedro Reviriego, Zhen Gao, Shanshan Liu, Fabrizio Lombardi

Abstract: The speed of open-weights large language models (LLMs) and its dependency on the task at hand, when run on GPUs, is studied to present a comparative analysis of the speed of the most popular open LLMs. The speed of open-weights large language models (LLMs) and its dependency on the task at hand, when run on GPUs, is studied to present a comparative analysis of the speed of the most popular open LLMs. △ Less

Submitted 23 February, 2025; originally announced February 2025.

Journal ref: Computer (Volume: 57, Issue: 8, August 2024)

arXiv:2502.13231 [pdf, other]

Las funciones booleans y el lema de Bonami

Authors: María José González, Paul MacManus, María Cristina Pereyra

Abstract: In this expository article, we study the relation between the boolean functions and the hypercontractivity theorems of Aline Bonami. We focus on the social choice theory, and present some of the most important results in the area, such as the Friedgut-Kalai-Naor (FKN) and the Kahn-Kalai-Linial (KKL) theorems, and the famous Fourier Entropy/Influence conjecture. -- En este artículo expositivo e… ▽ More In this expository article, we study the relation between the boolean functions and the hypercontractivity theorems of Aline Bonami. We focus on the social choice theory, and present some of the most important results in the area, such as the Friedgut-Kalai-Naor (FKN) and the Kahn-Kalai-Linial (KKL) theorems, and the famous Fourier Entropy/Influence conjecture. -- En este artículo expositivo estudiamos la relación entre las funciones booleanas y los teoremas de hipercontractividad de Aline Bonami. Nos concentramos en la teoría de la elección social, y presentamos algunos de los resultados más importantes en el área como los teoremas de Friedgut-Kalai-Naor (FKN) y de Kahn-Kalai-Linial (KKL), y la famosa conjetura Entropíıa de Fourier/Influencia. △ Less

Submitted 18 February, 2025; originally announced February 2025.

Comments: 37 pages, in Spanish, 2 photos

MSC Class: 43.01; 68R01

Journal ref: La Gaceta de la RSME, Vol. 28 (2025), Núm. 1, Págs. 51-87

arXiv:2501.06561 [pdf, other]

Where to Go Next Day: Multi-scale Spatial-Temporal Decoupled Model for Mid-term Human Mobility Prediction

Authors: Zongyuan Huang, Weipeng Wang, Shaoyu Huang, Marta C. Gonzalez, Yaohui Jin, Yanyan Xu

Abstract: Predicting individual mobility patterns is crucial across various applications. While current methods mainly focus on predicting the next location for personalized services like recommendations, they often fall short in supporting broader applications such as traffic management and epidemic control, which require longer period forecasts of human mobility. This study addresses mid-term mobility pre… ▽ More Predicting individual mobility patterns is crucial across various applications. While current methods mainly focus on predicting the next location for personalized services like recommendations, they often fall short in supporting broader applications such as traffic management and epidemic control, which require longer period forecasts of human mobility. This study addresses mid-term mobility prediction, aiming to capture daily travel patterns and forecast trajectories for the upcoming day or week. We propose a novel Multi-scale Spatial-Temporal Decoupled Predictor (MSTDP) designed to efficiently extract spatial and temporal information by decoupling daily trajectories into distinct location-duration chains. Our approach employs a hierarchical encoder to model multi-scale temporal patterns, including daily recurrence and weekly periodicity, and utilizes a transformer-based decoder to globally attend to predicted information in the location or duration chain. Additionally, we introduce a spatial heterogeneous graph learner to capture multi-scale spatial relationships, enhancing semantic-rich representations. Extensive experiments, including statistical physics analysis, are conducted on large-scale mobile phone records in five cities (Boston, Los Angeles, SF Bay Area, Shanghai, and Tokyo), to demonstrate MSTDP's advantages. Applied to epidemic modeling in Boston, MSTDP significantly outperforms the best-performing baseline, achieving a remarkable 62.8% reduction in MAE for cumulative new cases. △ Less

Submitted 11 January, 2025; originally announced January 2025.

arXiv:2412.17440 [pdf, other]

The Role of XAI in Transforming Aeronautics and Aerospace Systems

Authors: Francisco Javier Cantero Zorita, Mikel Galafate, Javier M. Moguerza, Isaac Martín de Diego, M. Teresa Gonzalez, Gema Gutierrez Peña

Abstract: Recent advancements in Artificial Intelligence (AI) have transformed decision-making in aeronautics and aerospace. These advancements in AI have brought with them the need to understand the reasons behind the predictions generated by AI systems and models, particularly by professionals in these sectors. In this context, the emergence of eXplainable Artificial Intelligence (XAI) has helped bridge t… ▽ More Recent advancements in Artificial Intelligence (AI) have transformed decision-making in aeronautics and aerospace. These advancements in AI have brought with them the need to understand the reasons behind the predictions generated by AI systems and models, particularly by professionals in these sectors. In this context, the emergence of eXplainable Artificial Intelligence (XAI) has helped bridge the gap between professionals in the aeronautical and aerospace sectors and the AI systems and models they work with. For this reason, this paper provides a review of the concept of XAI is carried out defining the term and the objectives it aims to achieve. Additionally, the paper discusses the types of models defined within it and the properties these models must fulfill to be considered transparent, as well as the post-hoc techniques used to understand AI systems and models after their training. Finally, various application areas within the aeronautical and aerospace sectors will be presented, highlighting how XAI is used in these fields to help professionals understand the functioning of AI systems and models. △ Less

Submitted 23 December, 2024; originally announced December 2024.

arXiv:2410.17648 [pdf, other]

Towards Active Participant Centric Vertical Federated Learning: Some Representations May Be All You Need

Authors: Jon Irureta, Jon Imaz, Aizea Lojo, Javier Fernandez-Marques, Marco González, Iñigo Perona

Abstract: Existing Vertical FL (VFL) methods often struggle with realistic and unaligned data partitions, and incur into high communication costs and significant operational complexity. This work introduces a novel approach to VFL, Active Participant Centric VFL (APC-VFL), that excels in scenarios when data samples among participants are partially aligned at training. Among its strengths, APC-VFL only requi… ▽ More Existing Vertical FL (VFL) methods often struggle with realistic and unaligned data partitions, and incur into high communication costs and significant operational complexity. This work introduces a novel approach to VFL, Active Participant Centric VFL (APC-VFL), that excels in scenarios when data samples among participants are partially aligned at training. Among its strengths, APC-VFL only requires a single communication step with the active participant. This is made possible through a local and unsupervised representation learning stage at each participant followed by a knowledge distillation step in the active participant. Compared to other VFL methods such as SplitNN or VFedTrans, APC-VFL consistently outperforms them across three popular VFL datasets in terms of F1, accuracy and communication costs as the ratio of aligned data is reduced. △ Less

Submitted 19 February, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

arXiv:2407.18745 [pdf, other]

FairAIED: Navigating Fairness, Bias, and Ethics in Educational AI Applications

Authors: Sribala Vidyadhari Chinta, Zichong Wang, Zhipeng Yin, Nhat Hoang, Matthew Gonzalez, Tai Le Quy, Wenbin Zhang

Abstract: The integration of Artificial Intelligence (AI) into education has transformative potential, providing tailored learning experiences and creative instructional approaches. However, the inherent biases in AI algorithms hinder this improvement by unintentionally perpetuating prejudice against specific demographics, especially in human-centered applications like education. This survey delves deeply i… ▽ More The integration of Artificial Intelligence (AI) into education has transformative potential, providing tailored learning experiences and creative instructional approaches. However, the inherent biases in AI algorithms hinder this improvement by unintentionally perpetuating prejudice against specific demographics, especially in human-centered applications like education. This survey delves deeply into the developing topic of algorithmic fairness in educational contexts, providing a comprehensive evaluation of the diverse literature on fairness, bias, and ethics in AI-driven educational applications. It identifies the common forms of biases, such as data-related, algorithmic, and user-interaction, that fundamentally undermine the accomplishment of fairness in AI teaching aids. By outlining existing techniques for mitigating these biases, ranging from varied data gathering to algorithmic fairness interventions, the survey emphasizes the critical role of ethical considerations and legal frameworks in shaping a more equitable educational environment. Furthermore, it guides readers through the complexities of fairness measurements, methods, and datasets, shedding light on the way to bias reduction. Despite these gains, this survey highlights long-standing issues, such as achieving a balance between fairness and accuracy, as well as the need for diverse datasets. Overcoming these challenges and ensuring the ethical and fair use of AI's promise in education call for a collaborative, interdisciplinary approach. △ Less

Submitted 26 July, 2024; originally announced July 2024.

arXiv:2407.09549 [pdf]

doi 10.1007/s00146-025-02351-5

Recursive InPainting (RIP): how much information is lost under recursive inferences?

Authors: Javier Conde, Miguel González, Gonzalo Martínez, Fernando Moral, Elena Merino-Gómez, Pedro Reviriego

Abstract: The rapid adoption of generative artificial intelligence (AI) is accelerating content creation and modification. For example, variations of a given content, be it text or images, can be created almost instantly and at a low cost. This will soon lead to the majority of text and images being created directly by AI models or by humans assisted by AI. This poses new risks; for example, AI-generated co… ▽ More The rapid adoption of generative artificial intelligence (AI) is accelerating content creation and modification. For example, variations of a given content, be it text or images, can be created almost instantly and at a low cost. This will soon lead to the majority of text and images being created directly by AI models or by humans assisted by AI. This poses new risks; for example, AI-generated content may be used to train newer AI models and degrade their performance, or information may be lost in the transformations made by AI which could occur when the same content is processed over and over again by AI tools. An example of AI image modifications is inpainting in which an AI model completes missing fragments of an image. The incorporation of inpainting tools into photo editing programs promotes their adoption and encourages their recursive use to modify images. Inpainting can be applied recursively, starting from an image, removing some parts, applying inpainting to reconstruct the image, revising it, and then starting the inpainting process again on the reconstructed image, etc. This paper presents an empirical evaluation of recursive inpainting when using one of the most widely used image models: Stable Diffusion. The inpainting process is applied by randomly selecting a fragment of the image, reconstructing it, selecting another fragment, and repeating the process a predefined number of iterations. The images used in the experiments are taken from a publicly available art data set and correspond to different styles and historical periods. Additionally, photographs are also evaluated as a reference. The modified images are compared with the original ones by both using quantitative metrics and performing a qualitative analysis. The results show that recursive inpainting in some cases modifies the image so that it still resembles the original one while in others leads to degeneration. △ Less

Submitted 25 May, 2025; v1 submitted 27 June, 2024; originally announced July 2024.

Comments: AI & Soc (2025)

arXiv:2406.12346 [pdf, other]

Towards the Certification of Hybrid Architectures: Analysing Interference on Hardware Accelerators through PML

Authors: Benjamin Lesage, Frédéric Boniol, Kevin Delmas, Adrien Gauffriau, Alfonso Mascarenas Gonzalez, Claire Pagetti

Abstract: The emergence of Deep Neural Network (DNN) and machine learning-based applications paved the way for a new generation of hybrid hardware platforms. Hybrid platforms embed several cores and accelerators in a small package. However, in order to satisfy the Size, Weight and Power (SWaP) constraints, limited and shared resources are integrated. This paper presents an overview of the standards applicab… ▽ More The emergence of Deep Neural Network (DNN) and machine learning-based applications paved the way for a new generation of hybrid hardware platforms. Hybrid platforms embed several cores and accelerators in a small package. However, in order to satisfy the Size, Weight and Power (SWaP) constraints, limited and shared resources are integrated. This paper presents an overview of the standards applicable to the certification of hybrid platforms and an early mapping of their objectives to said platforms. In particular, we consider how the classification of AMC20-152A for airborne electronic hardware applies to hybrid platforms. We also consider AMC20-193 for multi-core platforms, and how this standard fits different types of accelerators. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 12th European Congress on Embedded Real Time Software and Systems (ERTS 2024), Jun 2024, Toulouse, France

arXiv:2405.10054 [pdf, other]

A finite-sample generalization bound for stable LPV systems

Authors: Daniel Racz, Martin Gonzalez, Mihaly Petreczky, Andras Benczur, Balint Daroczy

Abstract: One of the main theoretical challenges in learning dynamical systems from data is providing upper bounds on the generalization error, that is, the difference between the expected prediction error and the empirical prediction error measured on some finite sample. In machine learning, a popular class of such bounds are the so-called Probably Approximately Correct (PAC) bounds. In this paper, we deri… ▽ More One of the main theoretical challenges in learning dynamical systems from data is providing upper bounds on the generalization error, that is, the difference between the expected prediction error and the empirical prediction error measured on some finite sample. In machine learning, a popular class of such bounds are the so-called Probably Approximately Correct (PAC) bounds. In this paper, we derive a PAC bound for stable continuous-time linear parameter-varying (LPV) systems. Our bound depends on the H2 norm of the chosen class of the LPV systems, but does not depend on the time interval for which the signals are considered. △ Less

Submitted 21 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: 8 pages, 1 figure, under review

MSC Class: 68 ACM Class: I.2.0

arXiv:2404.16208 [pdf, other]

GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures

Authors: Sahil Hassan, Michael Inouye, Miguel C. Gonzalez, Ilkin Aliyev, Joshua Mack, Maisha Hafiz, Ali Akoglu

Abstract: Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks and explore design optimizations before committing to silicon. Reconfigurable Architecture for Neuromorphic Computing (RANC) is one such tool that offers ability to execute pre-trained Spiking Neural Network (SNN) models within a unified ecosystem t… ▽ More Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks and explore design optimizations before committing to silicon. Reconfigurable Architecture for Neuromorphic Computing (RANC) is one such tool that offers ability to execute pre-trained Spiking Neural Network (SNN) models within a unified ecosystem through both software-based simulation and FPGA-based emulation. RANC has been utilized by the community with its flexible and highly parameterized design to study implementation bottlenecks, tune architectural parameters or modify neuron behavior based on application insights and study the trade space on hardware performance and network accuracy. In designing architectures for use in neuromorphic computing, there are an incredibly large number of configuration parameters such as number and precision of weights per neuron, neuron and axon counts per core, network topology, and neuron behavior. To accelerate such studies and provide users with a streamlined productive design space exploration, in this paper we introduce the GPU-based implementation of RANC. We summarize our parallelization approach and quantify the speedup gains achieved with GPU-based tick-accurate simulations across various use cases. We demonstrate up to 780 times speedup compared to serial version of the RANC simulator based on a 512 neuromorphic core MNIST inference application. We believe that the RANC ecosystem now provides a much more feasible avenue in the research of exploring different optimizations for accelerating SNNs and performing richer studies by enabling rapid convergence to optimized neuromorphic architectures. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: Accepted for publication in Neuro-Inspired Computational Elements (NICE) Workshop 2024

arXiv:2403.15491 [pdf, other]

doi 10.26342/2024-73-7

Open Conversational LLMs do not know most Spanish words

Authors: Javier Conde, Miguel González, Nina Melero, Raquel Ferrando, Gonzalo Martínez, Elena Merino-Gómez, José Alberto Hernández, Pedro Reviriego

Abstract: The growing interest in Large Language Models (LLMs) and in particular in conversational models with which users can interact has led to the development of a large number of open-source chat LLMs. These models are evaluated on a wide range of benchmarks to assess their capabilities in answering questions or solving problems on almost any possible topic or to test their ability to reason or interpr… ▽ More The growing interest in Large Language Models (LLMs) and in particular in conversational models with which users can interact has led to the development of a large number of open-source chat LLMs. These models are evaluated on a wide range of benchmarks to assess their capabilities in answering questions or solving problems on almost any possible topic or to test their ability to reason or interpret texts. Instead, the evaluation of the knowledge that these models have of the languages has received much less attention. For example, the words that they can recognize and use in different languages. In this paper, we evaluate the knowledge that open-source chat LLMs have of Spanish words by testing a sample of words in a reference dictionary. The results show that open-source chat LLMs produce incorrect meanings for an important fraction of the words and are not able to use most of the words correctly to write sentences with context. These results show how Spanish is left behind in the open-source LLM race and highlight the need to push for linguistic fairness in conversational LLMs ensuring that they provide similar performance across languages. △ Less

Submitted 24 September, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: Procesamiento del Lenguaje Natural, 73, 95-108

Journal ref: Procesamiento del Lenguaje Natural, n. 73, 2024. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6603

arXiv:2402.15243 [pdf, other]

Safety-Conscious Pushing on Diverse Oriented Surfaces with Underactuated Aerial Vehicles

Authors: Tong Hui, Manuel J. Fernandez Gonzalez, Matteo Fumagalli

Abstract: Pushing tasks performed by aerial manipulators can be used for contact-based industrial inspections. Underactuated aerial vehicles are widely employed in aerial manipulation due to their widespread availability and relatively low cost. Industrial infrastructures often consist of diverse oriented work surfaces. When interacting with such surfaces, the coupled gravity compensation and interaction fo… ▽ More Pushing tasks performed by aerial manipulators can be used for contact-based industrial inspections. Underactuated aerial vehicles are widely employed in aerial manipulation due to their widespread availability and relatively low cost. Industrial infrastructures often consist of diverse oriented work surfaces. When interacting with such surfaces, the coupled gravity compensation and interaction force generation of underactuated aerial vehicles can present the potential challenge of near-saturation operations. The blind utilization of these platforms for such tasks can lead to instability and accidents, creating unsafe operating conditions and potentially damaging the platform. In order to ensure safe pushing on these surfaces while managing platform saturation, this work establishes a safety assessment process. This process involves the prediction of the saturation level of each actuator during pushing across variable surface orientations. Furthermore, the assessment results are used to plan and execute physical experiments, ensuring safe operations and preventing platform damage. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: Accepted to the 2024 IEEE International Conference on Robotics and Automation (ICRA2024)

arXiv:2401.16247 [pdf, other]

Towards Red Teaming in Multimodal and Multilingual Translation

Authors: Christophe Ropers, David Dale, Prangthip Hansanti, Gabriel Mejia Gonzalez, Ivan Evtimov, Corinne Wong, Christophe Touret, Kristina Pereyra, Seohyun Sonia Kim, Cristian Canton Ferrer, Pierre Andrews, Marta R. Costa-jussà

Abstract: Assessing performance in Natural Language Processing is becoming increasingly complex. One particular challenge is the potential for evaluation datasets to overlap with training data, either directly or indirectly, which can lead to skewed results and overestimation of model performance. As a consequence, human evaluation is gaining increasing interest as a means to assess the performance and reli… ▽ More Assessing performance in Natural Language Processing is becoming increasingly complex. One particular challenge is the potential for evaluation datasets to overlap with training data, either directly or indirectly, which can lead to skewed results and overestimation of model performance. As a consequence, human evaluation is gaining increasing interest as a means to assess the performance and reliability of models. One such method is the red teaming approach, which aims to generate edge cases where a model will produce critical errors. While this methodology is becoming standard practice for generative AI, its application to the realm of conditional AI remains largely unexplored. This paper presents the first study on human-based red teaming for Machine Translation (MT), marking a significant step towards understanding and improving the performance of translation models. We delve into both human-based red teaming and a study on automation, reporting lessons learned and providing recommendations for both translation models and red teaming drills. This pioneering work opens up new avenues for research and development in the field of MT. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2312.05187

ACM Class: I.2.7

arXiv:2312.05187 [pdf, other]

Seamless: Multilingual Expressive and Streaming Speech Translation

Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. SeamlessM4T v2 provides the foundation on which our next two models are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one's voice. As for SeamlessStreaming, our model leverages the Efficient Monotonic Multihead Attention mechanism to generate low-latency target translations without waiting for complete source utterances. As the first of its kind, SeamlessStreaming enables simultaneous speech-to-speech/text translation for multiple source and target languages. To ensure that our models can be used safely and responsibly, we implemented the first known red-teaming effort for multimodal machine translation, a system for the detection and mitigation of added toxicity, a systematic evaluation of gender bias, and an inaudible localized watermarking mechanism designed to dampen the impact of deepfakes. Consequently, we bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time. The contributions to this work are publicly released and accessible at https://github.com/facebookresearch/seamless_communication △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2311.18491 [pdf, other]

ZeST-NeRF: Using temporal aggregation for Zero-Shot Temporal NeRFs

Authors: Violeta Menéndez González, Andrew Gilbert, Graeme Phillipson, Stephen Jolly, Simon Hadfield

Abstract: In the field of media production, video editing techniques play a pivotal role. Recent approaches have had great success at performing novel view image synthesis of static scenes. But adding temporal information adds an extra layer of complexity. Previous models have focused on implicitly representing static and dynamic scenes using NeRF. These models achieve impressive results but are costly at t… ▽ More In the field of media production, video editing techniques play a pivotal role. Recent approaches have had great success at performing novel view image synthesis of static scenes. But adding temporal information adds an extra layer of complexity. Previous models have focused on implicitly representing static and dynamic scenes using NeRF. These models achieve impressive results but are costly at training and inference time. They overfit an MLP to describe the scene implicitly as a function of position. This paper proposes ZeST-NeRF, a new approach that can produce temporal NeRFs for new scenes without retraining. We can accurately reconstruct novel views using multi-view synthesis techniques and scene flow-field estimation, trained only with unrelated scenes. We demonstrate how existing state-of-the-art approaches from a range of fields cannot adequately solve this new task and demonstrate the efficacy of our solution. The resulting network improves quantitatively by 15% and produces significantly better visual results. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: VUA BMVC 2023

arXiv:2311.11742 [pdf, other]

Fuzzy Information Seeded Region Growing for Automated Lesions After Stroke Segmentation in MR Brain Images

Authors: Mario Pascual González

Abstract: In the realm of medical imaging, precise segmentation of stroke lesions from brain MRI images stands as a critical challenge with significant implications for patient diagnosis and treatment. Addressing this, our study introduces an innovative approach using a Fuzzy Information Seeded Region Growing (FISRG) algorithm. Designed to effectively delineate the complex and irregular boundaries of stroke… ▽ More In the realm of medical imaging, precise segmentation of stroke lesions from brain MRI images stands as a critical challenge with significant implications for patient diagnosis and treatment. Addressing this, our study introduces an innovative approach using a Fuzzy Information Seeded Region Growing (FISRG) algorithm. Designed to effectively delineate the complex and irregular boundaries of stroke lesions, the FISRG algorithm combines fuzzy logic with Seeded Region Growing (SRG) techniques, aiming to enhance segmentation accuracy. The research involved three experiments to optimize the FISRG algorithm's performance, each focusing on different parameters to improve the accuracy of stroke lesion segmentation. The highest Dice score achieved in these experiments was 94.2\%, indicating a high degree of similarity between the algorithm's output and the expert-validated ground truth. Notably, the best average Dice score, amounting to 88.1\%, was recorded in the third experiment, highlighting the efficacy of the algorithm in consistently segmenting stroke lesions across various slices. Our findings reveal the FISRG algorithm's strengths in handling the heterogeneity of stroke lesions. However, challenges remain in areas of abrupt lesion topology changes and in distinguishing lesions from similar intensity brain regions. The results underscore the potential of the FISRG algorithm in contributing significantly to advancements in medical imaging analysis for stroke diagnosis and treatment. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 10 pages, 14 figures. Associated code and data available at: https://github.com/Mawio02/FISRG-for-Automated-Lesion-After-Stroke-Segmentation-in-MRI

MSC Class: 92C55

arXiv:2308.16599 [pdf, other]

Using machine learning to understand causal relationships between urban form and travel CO2 emissions across continents

Authors: Felix Wagner, Florian Nachtigall, Lukas Franken, Nikola Milojevic-Dupont, Rafael H. M. Pereira, Nicolas Koch, Jakob Runge, Marta Gonzalez, Felix Creutzig

Abstract: Climate change mitigation in urban mobility requires policies reconfiguring urban form to increase accessibility and facilitate low-carbon modes of transport. However, current policy research has insufficiently assessed urban form effects on car travel at three levels: (1) Causality -- Can causality be established beyond theoretical and correlation-based analyses? (2) Generalizability -- Do relati… ▽ More Climate change mitigation in urban mobility requires policies reconfiguring urban form to increase accessibility and facilitate low-carbon modes of transport. However, current policy research has insufficiently assessed urban form effects on car travel at three levels: (1) Causality -- Can causality be established beyond theoretical and correlation-based analyses? (2) Generalizability -- Do relationships hold across different cities and world regions? (3) Context specificity -- How do relationships vary across neighborhoods of a city? Here, we address all three gaps via causal graph discovery and explainable machine learning to detect urban form effects on intra-city car travel, based on mobility data of six cities across three continents. We find significant causal effects of urban form on trip emissions and inter-feature effects, which had been neglected in previous work. Our results demonstrate that destination accessibility matters most overall, while low density and low connectivity also sharply increase CO$_2$ emissions. These general trends are similar across cities but we find idiosyncratic effects that can lead to substantially different recommendations. In more monocentric cities, we identify spatial corridors -- about 10--50 km from the city center -- where subcenter-oriented development is more relevant than increased access to the main center. Our work demonstrates a novel application of machine learning that enables new research addressing the needs of causality, generalizability, and contextual specificity for scaling evidence-based urban climate solutions. △ Less

Submitted 15 December, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

Comments: 32 pages, 24 figures, 6 tables

arXiv:2308.11596 [pdf, other]

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim , et al. (43 additional authors not shown)

Abstract: What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded s… ▽ More What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communication △ Less

Submitted 24 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

ACM Class: I.2.7

arXiv:2308.02534 [pdf, other]

Exploring the Role of Explainability in AI-Assisted Embryo Selection

Authors: Lucia Urcelay, Daniel Hinjos, Pablo A. Martin-Torres, Marta Gonzalez, Marta Mendez, Salva Cívico, Sergio Álvarez-Napagao, Dario Garcia-Gasulla

Abstract: In Vitro Fertilization is among the most widespread treatments for infertility. One of its main challenges is the evaluation and selection of embryo for implantation, a process with large inter- and intra-clinician variability. Deep learning based methods are gaining attention, but their opaque nature compromises their acceptance in the clinical context, where transparency in the decision making i… ▽ More In Vitro Fertilization is among the most widespread treatments for infertility. One of its main challenges is the evaluation and selection of embryo for implantation, a process with large inter- and intra-clinician variability. Deep learning based methods are gaining attention, but their opaque nature compromises their acceptance in the clinical context, where transparency in the decision making is key. In this paper we analyze the current work in the explainability of AI-assisted embryo analysis models, identifying the limitations. We also discuss how these models could be integrated in the clinical context as decision support systems, considering the needs of clinicians and patients. Finally, we propose guidelines for the sake of increasing interpretability and trustworthiness, pushing this technology forward towards established clinical practice. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2306.14258 [pdf, ps, other]

A Neural RDE approach for continuous-time non-Markovian stochastic control problems

Authors: Melker Hoglund, Emilio Ferrucci, Camilo Hernandez, Aitor Muguruza Gonzalez, Cristopher Salvi, Leandro Sanchez-Betancourt, Yufei Zhang

Abstract: We propose a novel framework for solving continuous-time non-Markovian stochastic control problems by means of neural rough differential equations (Neural RDEs) introduced in Morrill et al. (2021). Non-Markovianity naturally arises in control problems due to the time delay effects in the system coefficients or the driving noises, which leads to optimal control strategies depending explicitly on th… ▽ More We propose a novel framework for solving continuous-time non-Markovian stochastic control problems by means of neural rough differential equations (Neural RDEs) introduced in Morrill et al. (2021). Non-Markovianity naturally arises in control problems due to the time delay effects in the system coefficients or the driving noises, which leads to optimal control strategies depending explicitly on the historical trajectories of the system state. By modelling the control process as the solution of a Neural RDE driven by the state process, we show that the control-state joint dynamics are governed by an uncontrolled, augmented Neural RDE, allowing for fast Monte-Carlo estimation of the value function via trajectories simulation and memory-efficient backpropagation. We provide theoretical underpinnings for the proposed algorithmic framework by demonstrating that Neural RDEs serve as universal approximators for functions of random rough paths. Exhaustive numerical experiments on non-Markovian stochastic control problems are presented, which reveal that the proposed framework is time-resolution-invariant and achieves higher accuracy and better stability in irregular sampling compared to existing RNN-based approaches. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: Accepted at ICML 2023, Workshop on New Frontiers in Learning, Control, and Dynamical Systems

arXiv:2306.06194 [pdf, other]

Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration

Authors: Juan D. Caicedo, Carlos Guirado, Marta C. González, Joan L. Walker

Abstract: This research foregrounds general practices in travel demand research, emphasizing the need to change our ways. A critical barrier preventing travel demand literature from effectively informing policy is the volume of publications without clear, consolidated benchmarks, making it difficult for researchers and policymakers to gather insights and use models to guide decision-making. By emphasizing r… ▽ More This research foregrounds general practices in travel demand research, emphasizing the need to change our ways. A critical barrier preventing travel demand literature from effectively informing policy is the volume of publications without clear, consolidated benchmarks, making it difficult for researchers and policymakers to gather insights and use models to guide decision-making. By emphasizing reproducibility and open collaboration, we aim to enhance the reliability and policy relevance of travel demand research. We present a collaborative infrastructure for transit demand prediction models, focusing on their performance during highly dynamic conditions like the COVID-19 pandemic. Drawing from over 300 published papers, we develop an open-source infrastructure with five common methodologies and assess their performance under stable and dynamic conditions. We found that the prediction error for the LSTM deep learning approach stabilized at a mean arctangent absolute percentage error (MAAPE) of about 0.12 within 1.5 months, whereas other models continued to exhibit higher error rates even a year into the pandemic. If research practices had prioritized reproducibility before the COVID-19 pandemic, transit agencies would have had clearer guidance on the best forecasting methods and quickly identified those best suited for pandemic conditions to inform operations in response to changes in transit demand. The aim of this open-source codebase is to lower the barrier for other researchers to replicate, reproduce models and build upon findings. We encourage researchers to test their own modeling approaches on this benchmarking platform, challenge the analyses conducted in this paper, and develop model specifications that can outperform those evaluated here. Further, collaborative research approaches must be expanded across travel demand modeling if we wish to impact policy and planning. △ Less

Submitted 14 July, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: 18 pages, 8 figures

arXiv:2305.14267 [pdf, other]

SEEDS: Exponential SDE Solvers for Fast High-Quality Sampling from Diffusion Models

Authors: Martin Gonzalez, Nelson Fernandez, Thuy Tran, Elies Gherbi, Hatem Hajri, Nader Masmoudi

Abstract: A potent class of generative models known as Diffusion Probabilistic Models (DPMs) has become prominent. A forward diffusion process adds gradually noise to data, while a model learns to gradually denoise. Sampling from pre-trained DPMs is obtained by solving differential equations (DE) defined by the learnt model, a process which has shown to be prohibitively slow. Numerous efforts on speeding-up… ▽ More A potent class of generative models known as Diffusion Probabilistic Models (DPMs) has become prominent. A forward diffusion process adds gradually noise to data, while a model learns to gradually denoise. Sampling from pre-trained DPMs is obtained by solving differential equations (DE) defined by the learnt model, a process which has shown to be prohibitively slow. Numerous efforts on speeding-up this process have consisted on crafting powerful ODE solvers. Despite being quick, such solvers do not usually reach the optimal quality achieved by available slow SDE solvers. Our goal is to propose SDE solvers that reach optimal quality without requiring several hundreds or thousands of NFEs to achieve that goal. We propose Stochastic Explicit Exponential Derivative-free Solvers (SEEDS), improving and generalizing Exponential Integrator approaches to the stochastic case on several frameworks. After carefully analyzing the formulation of exact solutions of diffusion SDEs, we craft SEEDS to analytically compute the linear part of such solutions. Inspired by the Exponential Time-Differencing method, SEEDS use a novel treatment of the stochastic components of solutions, enabling the analytical computation of their variance, and contains high-order terms allowing to reach optimal quality sampling $\sim3$-$5\times$ faster than previous SDE methods. We validate our approach on several image generation benchmarks, showing that SEEDS outperform or are competitive with previous SDE solvers. Contrary to the latter, SEEDS are derivative and training free, and we fully prove strong convergence guarantees for them. △ Less

Submitted 26 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 60 pages. Camera-Ready version for the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

MSC Class: I.2.6

arXiv:2211.07301 [pdf, other]

SVS: Adversarial refinement for sparse novel view synthesis

Authors: Violeta Menéndez González, Andrew Gilbert, Graeme Phillipson, Stephen Jolly, Simon Hadfield

Abstract: This paper proposes Sparse View Synthesis. This is a view synthesis problem where the number of reference views is limited, and the baseline between target and reference view is significant. Under these conditions, current radiance field methods fail catastrophically due to inescapable artifacts such 3D floating blobs, blurring and structural duplication, whenever the number of reference views is… ▽ More This paper proposes Sparse View Synthesis. This is a view synthesis problem where the number of reference views is limited, and the baseline between target and reference view is significant. Under these conditions, current radiance field methods fail catastrophically due to inescapable artifacts such 3D floating blobs, blurring and structural duplication, whenever the number of reference views is limited, or the target view diverges significantly from the reference views. Advances in network architecture and loss regularisation are unable to satisfactorily remove these artifacts. The occlusions within the scene ensure that the true contents of these regions is simply not available to the model. In this work, we instead focus on hallucinating plausible scene contents within such regions. To this end we unify radiance field models with adversarial learning and perceptual losses. The resulting system provides up to 60% improvement in perceptual accuracy compared to current state-of-the-art radiance field models on this problem. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: BMVC 2022

arXiv:2210.10865 [pdf, other]

Robotic Table Wiping via Reinforcement Learning and Whole-body Trajectory Optimization

Authors: Thomas Lew, Sumeet Singh, Mario Prats, Jeffrey Bingham, Jonathan Weisz, Benjie Holson, Xiaohan Zhang, Vikas Sindhwani, Yao Lu, Fei Xia, Peng Xu, Tingnan Zhang, Jie Tan, Montserrat Gonzalez

Abstract: We propose a framework to enable multipurpose assistive mobile robots to autonomously wipe tables to clean spills and crumbs. This problem is challenging, as it requires planning wiping actions while reasoning over uncertain latent dynamics of crumbs and spills captured via high-dimensional visual observations. Simultaneously, we must guarantee constraints satisfaction to enable safe deployment in… ▽ More We propose a framework to enable multipurpose assistive mobile robots to autonomously wipe tables to clean spills and crumbs. This problem is challenging, as it requires planning wiping actions while reasoning over uncertain latent dynamics of crumbs and spills captured via high-dimensional visual observations. Simultaneously, we must guarantee constraints satisfaction to enable safe deployment in unstructured cluttered environments. To tackle this problem, we first propose a stochastic differential equation to model crumbs and spill dynamics and absorption with a robot wiper. Using this model, we train a vision-based policy for planning wiping actions in simulation using reinforcement learning (RL). To enable zero-shot sim-to-real deployment, we dovetail the RL policy with a whole-body trajectory optimization framework to compute base and arm joint trajectories that execute the desired wiping motions while guaranteeing constraints satisfaction. We extensively validate our approach in simulation and on hardware. Video: https://youtu.be/inORKP4F3EI △ Less

Submitted 19 October, 2022; originally announced October 2022.

arXiv:2210.09817 [pdf, other]

Universal hidden monotonic trend estimation with contrastive learning

Authors: Edouard Pineau, Sébastien Razakarivony, Mauricio Gonzalez, Anthony Schrapffer

Abstract: In this paper, we describe a universal method for extracting the underlying monotonic trend factor from time series data. We propose an approach related to the Mann-Kendall test, a standard monotonic trend detection method and call it contrastive trend estimation (CTE). We show that the CTE method identifies any hidden trend underlying temporal data while avoiding the standard assumptions used for… ▽ More In this paper, we describe a universal method for extracting the underlying monotonic trend factor from time series data. We propose an approach related to the Mann-Kendall test, a standard monotonic trend detection method and call it contrastive trend estimation (CTE). We show that the CTE method identifies any hidden trend underlying temporal data while avoiding the standard assumptions used for monotonic trend identification. In particular, CTE can take any type of temporal data (vector, images, graphs, time series, etc.) as input. We finally illustrate the interest of our CTE method through several experiments on different types of data and problems. △ Less

Submitted 23 April, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

arXiv:2209.07888 [pdf, other]

TwistSLAM++: Fusing multiple modalities for accurate dynamic semantic SLAM

Authors: Mathieu Gonzalez, Eric Marchand, Amine Kacete, Jérôme Royan

Abstract: Most classical SLAM systems rely on the static scene assumption, which limits their applicability in real world scenarios. Recent SLAM frameworks have been proposed to simultaneously track the camera and moving objects. However they are often unable to estimate the canonical pose of the objects and exhibit a low object tracking accuracy. To solve this problem we propose TwistSLAM++, a semantic, dy… ▽ More Most classical SLAM systems rely on the static scene assumption, which limits their applicability in real world scenarios. Recent SLAM frameworks have been proposed to simultaneously track the camera and moving objects. However they are often unable to estimate the canonical pose of the objects and exhibit a low object tracking accuracy. To solve this problem we propose TwistSLAM++, a semantic, dynamic, SLAM system that fuses stereo images and LiDAR information. Using semantic information, we track potentially moving objects and associate them to 3D object detections in LiDAR scans to obtain their pose and size. Then, we perform registration on consecutive object scans to refine object pose estimation. Finally, object scans are used to estimate the shape of the object and constrain map points to lie on the estimated surface within the BA. We show on classical benchmarks that this fusion approach based on multimodal information improves the accuracy of object tracking. △ Less

Submitted 22 March, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

arXiv:2207.04672 [pdf]

No Language Left Behind: Scaling Human-Centered Machine Translation

Authors: NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran , et al. (14 additional authors not shown)

Abstract: Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality res… ▽ More Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind? In No Language Left Behind, we took on this challenge by first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed at narrowing the performance gap between low and high-resource languages. More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages. We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Critically, we evaluated the performance of over 40,000 different translation directions using a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system. Finally, we open source all contributions described in this work, accessible at https://github.com/facebookresearch/fairseq/tree/nllb. △ Less

Submitted 25 August, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

Comments: 190 pages

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2207.02132 [pdf, other]

Deterministic Decoupling of Global Features and its Application to Data Analysis

Authors: Eduardo Martinez-Enriquez, Maria del Mar Gonzalez, Javier Portilla

Abstract: We introduce a method for deterministic decoupling of global features and show its applicability to improve data analysis performance, as well as to open new venues for feature transfer. We propose a new formalism that is based on defining transformations on submanifolds, by following trajectories along the features gradients. Through these transformations we define a normalization that, we demons… ▽ More We introduce a method for deterministic decoupling of global features and show its applicability to improve data analysis performance, as well as to open new venues for feature transfer. We propose a new formalism that is based on defining transformations on submanifolds, by following trajectories along the features gradients. Through these transformations we define a normalization that, we demonstrate, allows for decoupling differentiable features. By applying this to sampling moments, we obtain a quasi-analytic solution for the orthokurtosis, a normalized version of the kurtosis that is not just decoupled from mean and variance, but also from skewness. We apply this method in the original data domain and at the output of a filter bank to regression and classification problems based on global descriptors, obtaining a consistent and significant improvement in performance as compared to using classical (non-decoupled) descriptors. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: 29 pages, 12 figures

ACM Class: I.4.7; I.5.1

arXiv:2206.08237 [pdf, other]

Noisy Learning for Neural ODEs Acts as a Robustness Locus Widening

Authors: Martin Gonzalez, Hatem Hajri, Loic Cantat, Mihaly Petreczky

Abstract: We investigate the problems and challenges of evaluating the robustness of Differential Equation-based (DE) networks against synthetic distribution shifts. We propose a novel and simple accuracy metric which can be used to evaluate intrinsic robustness and to validate dataset corruption simulators. We also propose methodology recommendations, destined for evaluating the many faces of neural DEs' r… ▽ More We investigate the problems and challenges of evaluating the robustness of Differential Equation-based (DE) networks against synthetic distribution shifts. We propose a novel and simple accuracy metric which can be used to evaluate intrinsic robustness and to validate dataset corruption simulators. We also propose methodology recommendations, destined for evaluating the many faces of neural DEs' robustness and for comparing them with their discrete counterparts rigorously. We then use this criteria to evaluate a cheap data augmentation technique as a reliable way for demonstrating the natural robustness of neural ODEs against simulated image corruptions across multiple datasets. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: Accepted at ICLM 2022 Workshop "PODS"

ACM Class: I.2.6

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2205.11989 [pdf, ps, other]

doi 10.1016/j.sysconle.2023.105468

Realization Theory Of Recurrent Neural ODEs Using Polynomial System Embeddings

Authors: Martin Gonzalez, Thibault Defourneau, Hatem Hajri, Mihaly Petreczky

Abstract: In this paper we show that neural ODE analogs of recurrent (ODE-RNN) and Long Short-Term Memory (ODE-LSTM) networks can be algorithmically embeddeded into the class of polynomial systems. This embedding preserves input-output behavior and can suitably be extended to other neural DE architectures. We then use realization theory of polynomial systems to provide necessary conditions for an input-outp… ▽ More In this paper we show that neural ODE analogs of recurrent (ODE-RNN) and Long Short-Term Memory (ODE-LSTM) networks can be algorithmically embeddeded into the class of polynomial systems. This embedding preserves input-output behavior and can suitably be extended to other neural DE architectures. We then use realization theory of polynomial systems to provide necessary conditions for an input-output map to be realizable by an ODE-LSTM and sufficient conditions for minimality of such systems. These results represent the first steps towards realization theory of recurrent neural ODE architectures, which is is expected be useful for model reduction and learning algorithm analysis of recurrent neural ODEs. △ Less

Submitted 1 August, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: 10 pages. Corrected typos and added references

Journal ref: Systems & Control Letters 173 (2023)

arXiv:2205.07014 [pdf, other]

SaiNet: Stereo aware inpainting behind objects with generative networks

Authors: Violeta Menéndez González, Andrew Gilbert, Graeme Phillipson, Stephen Jolly, Simon Hadfield

Abstract: In this work, we present an end-to-end network for stereo-consistent image inpainting with the objective of inpainting large missing regions behind objects. The proposed model consists of an edge-guided UNet-like network using Partial Convolutions. We enforce multi-view stereo consistency by introducing a disparity loss. More importantly, we develop a training scheme where the model is learned fro… ▽ More In this work, we present an end-to-end network for stereo-consistent image inpainting with the objective of inpainting large missing regions behind objects. The proposed model consists of an edge-guided UNet-like network using Partial Convolutions. We enforce multi-view stereo consistency by introducing a disparity loss. More importantly, we develop a training scheme where the model is learned from realistic stereo masks representing object occlusions, instead of the more common random masks. The technique is trained in a supervised way. Our evaluation shows competitive results compared to previous state-of-the-art techniques. △ Less

Submitted 14 May, 2022; originally announced May 2022.

Comments: Presented at AI4CC workshop at CVPR

arXiv:2202.12384 [pdf, other]

doi 10.1109/LRA.2022.3178150

TwistSLAM: Constrained SLAM in Dynamic Environment

Authors: Mathieu Gonzalez, Eric Marchand, Amine Kacete, Jérôme Royan

Abstract: Classical visual simultaneous localization and mapping (SLAM) algorithms usually assume the environment to be rigid. This assumption limits the applicability of those algorithms as they are unable to accurately estimate the camera poses and world structure in real life scenes containing moving objects (e.g. cars, bikes, pedestrians, etc.). To tackle this issue, we propose TwistSLAM: a semantic, dy… ▽ More Classical visual simultaneous localization and mapping (SLAM) algorithms usually assume the environment to be rigid. This assumption limits the applicability of those algorithms as they are unable to accurately estimate the camera poses and world structure in real life scenes containing moving objects (e.g. cars, bikes, pedestrians, etc.). To tackle this issue, we propose TwistSLAM: a semantic, dynamic and stereo SLAM system that can track dynamic objects in the environment. Our algorithm creates clusters of points according to their semantic class. Thanks to the definition of inter-cluster constraints modeled by mechanical joints (function of the semantic class), a novel constrained bundle adjustment is then able to jointly estimate both poses and velocities of moving objects along with the classical world structure and camera trajectory. We evaluate our approach on several sequences from the public KITTI dataset and demonstrate quantitatively that it improves camera and object tracking compared to state-of-the-art approaches. △ Less

Submitted 27 September, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

Comments: This work has been accepted at IEEE Robotics and Automation Letters

arXiv:2112.11853 [pdf, other]

Geodesic squared exponential kernel for non-rigid shape registration

Authors: Florent Jousse, Xavier Pennec, Hervé Delingette, Matilde Gonzalez

Abstract: This work addresses the problem of non-rigid registration of 3D scans, which is at the core of shape modeling techniques. Firstly, we propose a new kernel based on geodesic distances for the Gaussian Process Morphable Models (GPMMs) framework. The use of geodesic distances into the kernel makes it more adapted to the topological and geometric characteristics of the surface and leads to more realis… ▽ More This work addresses the problem of non-rigid registration of 3D scans, which is at the core of shape modeling techniques. Firstly, we propose a new kernel based on geodesic distances for the Gaussian Process Morphable Models (GPMMs) framework. The use of geodesic distances into the kernel makes it more adapted to the topological and geometric characteristics of the surface and leads to more realistic deformations around holes and curved areas. Since the kernel possesses hyperparameters we have optimized them for the task of face registration on the FaceWarehouse dataset. We show that the Geodesic squared exponential kernel performs significantly better than state of the art kernels for the task of face registration on all the 20 expressions of the FaceWarehouse dataset. Secondly, we propose a modification of the loss function used in the non-rigid ICP registration algorithm, that allows to weight the correspondences according to the confidence given to them. As a use case, we show that we can make the registration more robust to outliers in the 3D scans, such as non-skin parts. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: 2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021) PROCEEDINGS, Dec 2021, JODHPUR, India

arXiv:2110.14122 [pdf, other]

doi 10.1109/TSP.2021.3135689

Data-Driven Representations for Testing Independence: Modeling, Analysis and Connection with Mutual Information Estimation

Authors: Mauricio E. Gonzalez, Jorge F. Silva, Miguel Videla, Marcos E. Orchard

Abstract: This work addresses testing the independence of two continuous and finite-dimensional random variables from the design of a data-driven partition. The empirical log-likelihood statistic is adopted to approximate the sufficient statistics of an oracle test against independence (that knows the two hypotheses). It is shown that approximating the sufficient statistics of the oracle test offers a learn… ▽ More This work addresses testing the independence of two continuous and finite-dimensional random variables from the design of a data-driven partition. The empirical log-likelihood statistic is adopted to approximate the sufficient statistics of an oracle test against independence (that knows the two hypotheses). It is shown that approximating the sufficient statistics of the oracle test offers a learning criterion for designing a data-driven partition that connects with the problem of mutual information estimation. Applying these ideas in the context of a data-dependent tree-structured partition (TSP), we derive conditions on the TSP's parameters to achieve a strongly consistent distribution-free test of independence over the family of probabilities equipped with a density. Complementing this result, we present finite-length results that show our TSP scheme's capacity to detect the scenario of independence structurally with the data-driven partition as well as new sampling complexity bounds for this detection. Finally, some experimental analyses provide evidence regarding our scheme's advantage for testing independence compared with some strategies that do not use data-driven representations. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Showing 1–50 of 111 results for author: Gonzalez, M