-
A Conceptual Framework for AI Capability Evaluations
Authors:
María Victoria Carro,
Denise Alejandra Mester,
Francisca Gauna Selasco,
Luca Nicolás Forziati Gangi,
Matheo Sandleris Musa,
Lola Ramos Pereyra,
Mario Leiva,
Juan Gustavo Corvalan,
María Vanina Martinez,
Gerardo Simari
Abstract:
As AI systems advance and integrate into society, well-designed and transparent evaluations are becoming essential tools in AI governance, informing decisions by providing evidence about system capabilities and risks. Yet there remains a lack of clarity on how to perform these assessments both comprehensively and reliably. To address this gap, we propose a conceptual framework for analyzing AI cap…
▽ More
As AI systems advance and integrate into society, well-designed and transparent evaluations are becoming essential tools in AI governance, informing decisions by providing evidence about system capabilities and risks. Yet there remains a lack of clarity on how to perform these assessments both comprehensively and reliably. To address this gap, we propose a conceptual framework for analyzing AI capability evaluations, offering a structured, descriptive approach that systematizes the analysis of widely used methods and terminology without imposing new taxonomies or rigid formats. This framework supports transparency, comparability, and interpretability across diverse evaluations. It also enables researchers to identify methodological weaknesses, assists practitioners in designing evaluations, and provides policymakers with an accessible tool to scrutinize, compare, and navigate complex evaluation landscapes.
△ Less
Submitted 22 June, 2025;
originally announced June 2025.
-
AbRank: A Benchmark Dataset and Metric-Learning Framework for Antibody-Antigen Affinity Ranking
Authors:
Chunan Liu,
Aurelien Pelissier,
Yanjun Shao,
Lilian Denzler,
Andrew C. R. Martin,
Brooks Paige,
Mariia Rodriguez Martinez
Abstract:
Accurate prediction of antibody-antigen (Ab-Ag) binding affinity is essential for therapeutic design and vaccine development, yet the performance of current models is limited by noisy experimental labels, heterogeneous assay conditions, and poor generalization across the vast antibody and antigen sequence space. We introduce AbRank, a large-scale benchmark and evaluation framework that reframes af…
▽ More
Accurate prediction of antibody-antigen (Ab-Ag) binding affinity is essential for therapeutic design and vaccine development, yet the performance of current models is limited by noisy experimental labels, heterogeneous assay conditions, and poor generalization across the vast antibody and antigen sequence space. We introduce AbRank, a large-scale benchmark and evaluation framework that reframes affinity prediction as a pairwise ranking problem. AbRank aggregates over 380,000 binding assays from nine heterogeneous sources, spanning diverse antibodies, antigens, and experimental conditions, and introduces standardized data splits that systematically increase distribution shift, from local perturbations such as point mutations to broad generalization across novel antigens and antibodies. To ensure robust supervision, AbRank defines an m-confident ranking framework by filtering out comparisons with marginal affinity differences, focusing training on pairs with at least an m-fold difference in measured binding strength. As a baseline for the benchmark, we introduce WALLE-Affinity, a graph-based approach that integrates protein language model embeddings with structural information to predict pairwise binding preferences. Our benchmarks reveal significant limitations in current methods under realistic generalization settings and demonstrate that ranking-based training improves robustness and transferability. In summary, AbRank offers a robust foundation for machine learning models to generalize across the antibody-antigen space, with direct relevance for scalable, structure-aware antibody therapeutic design.
△ Less
Submitted 21 June, 2025;
originally announced June 2025.
-
Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems
Authors:
Matias Martinez,
Xavier Franch
Abstract:
The rapid progress in Automated Program Repair (APR) has been driven by advances in AI, particularly large language models (LLMs) and agent-based systems. SWE-Bench is a recent benchmark designed to evaluate LLM-based repair systems using real issues and pull requests mined from 12 popular open-source Python repositories. Its public leaderboards, SWE-Bench Lite and SWE-Bench Verified, have become…
▽ More
The rapid progress in Automated Program Repair (APR) has been driven by advances in AI, particularly large language models (LLMs) and agent-based systems. SWE-Bench is a recent benchmark designed to evaluate LLM-based repair systems using real issues and pull requests mined from 12 popular open-source Python repositories. Its public leaderboards, SWE-Bench Lite and SWE-Bench Verified, have become central platforms for tracking progress and comparing solutions. However, because the submission process does not require detailed documentation, the architectural design and origin of many solutions remain unclear. In this paper, we present the first comprehensive study of all submissions to the SWE-Bench Lite (68 entries) and Verified (79 entries) leaderboards, analyzing 67 unique approaches across dimensions such as submitter type, product availability, LLM usage, and system architecture. Our findings reveal the dominance of proprietary LLMs (especially Claude 3.5/3.7), the presence of both agentic and non-agentic designs, and a contributor base spanning from individual developers to large tech companies.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
How Do People Revise Inconsistent Beliefs? Examining Belief Revision in Humans with User Studies
Authors:
Stylianos Loukas Vasileiou,
Antonio Rago,
Maria Vanina Martinez,
William Yeoh
Abstract:
Understanding how humans revise their beliefs in light of new information is crucial for developing AI systems which can effectively model, and thus align with, human reasoning. While theoretical belief revision frameworks rely on a set of principles that establish how these operations are performed, empirical evidence from cognitive psychology suggests that people may follow different patterns wh…
▽ More
Understanding how humans revise their beliefs in light of new information is crucial for developing AI systems which can effectively model, and thus align with, human reasoning. While theoretical belief revision frameworks rely on a set of principles that establish how these operations are performed, empirical evidence from cognitive psychology suggests that people may follow different patterns when presented with conflicting information. In this paper, we present three comprehensive user studies showing that people consistently prefer explanation-based revisions, i.e., those which are guided by explanations, that result in changes to their belief systems that are not necessarily captured by classical belief change theory. Our experiments systematically investigate how people revise their beliefs with explanations for inconsistencies, whether they are provided with them or left to formulate them themselves, demonstrating a robust preference for what may seem non-minimal revisions across different types of scenarios. These findings have implications for AI systems designed to model human reasoning or interact with humans, suggesting that such systems should accommodate explanation-based, potentially non-minimal belief revision operators to better align with human cognitive processes.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider
Authors:
James Giroux,
Michael Martinez,
Cristiano Fanelli
Abstract:
The integration of Deep Learning (DL) into experimental nuclear and particle physics has driven significant progress in simulation and reconstruction workflows. However, traditional simulation frameworks such as Geant4 remain computationally intensive, especially for Cherenkov detectors, where simulating optical photon transport through complex geometries and reflective surfaces introduces a major…
▽ More
The integration of Deep Learning (DL) into experimental nuclear and particle physics has driven significant progress in simulation and reconstruction workflows. However, traditional simulation frameworks such as Geant4 remain computationally intensive, especially for Cherenkov detectors, where simulating optical photon transport through complex geometries and reflective surfaces introduces a major bottleneck. To address this, we present an open, standalone fast simulation tool for Detection of Internally Reflected Cherenkov Light (DIRC) detectors, with a focus on the High-Performance DIRC (hpDIRC) at the future Electron-Ion Collider (EIC). Our framework incorporates a suite of generative models tailored to accelerate particle identification (PID) tasks by offering a scalable, GPU-accelerated alternative to full Geant4-based simulations. Designed with accessibility in mind, our simulation package enables both DL researchers and physicists to efficiently generate high-fidelity large-scale datasets on demand, without relying on complex traditional simulation stacks. This flexibility supports the development and benchmarking of novel DL-driven PID methods. Moreover, this fast simulation pipeline represents a critical step toward enabling EIC-wide PID strategies that depend on virtually unlimited simulated samples, spanning the full acceptance of the hpDIRC.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
From Stability to Inconsistency: A Study of Moral Preferences in LLMs
Authors:
Monika Jotautaite,
Mary Phuong,
Chatrik Singh Mangat,
Maria Angelica Martinez
Abstract:
As large language models (LLMs) increasingly integrate into our daily lives, it becomes crucial to understand their implicit biases and moral tendencies. To address this, we introduce a Moral Foundations LLM dataset (MFD-LLM) grounded in Moral Foundations Theory, which conceptualizes human morality through six core foundations. We propose a novel evaluation method that captures the full spectrum o…
▽ More
As large language models (LLMs) increasingly integrate into our daily lives, it becomes crucial to understand their implicit biases and moral tendencies. To address this, we introduce a Moral Foundations LLM dataset (MFD-LLM) grounded in Moral Foundations Theory, which conceptualizes human morality through six core foundations. We propose a novel evaluation method that captures the full spectrum of LLMs' revealed moral preferences by answering a range of real-world moral dilemmas. Our findings reveal that state-of-the-art models have remarkably homogeneous value preferences, yet demonstrate a lack of consistency.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Authors:
NVIDIA,
:,
Aaron Blakeman,
Aarti Basant,
Abhinav Khattar,
Adithya Renduchintala,
Akhiad Bercovich,
Aleksander Ficek,
Alexis Bjorlin,
Ali Taghibakhshi,
Amala Sanjay Deshmukh,
Ameya Sunil Mahabaleshwarkar,
Andrew Tao,
Anna Shors,
Ashwath Aithal,
Ashwin Poojary,
Ayush Dattagupta,
Balaram Buddharaju,
Bobby Chen,
Boris Ginsburg,
Boxin Wang,
Brandon Norick,
Brian Butterfield,
Bryan Catanzaro,
Carlo del Mundo
, et al. (176 additional authors not shown)
Abstract:
As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transf…
▽ More
As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transformer model architecture with Mamba layers that perform constant computation and require constant memory per generated token. We show that Nemotron-H models offer either better or on-par accuracy compared to other similarly-sized state-of-the-art open-sourced Transformer models (e.g., Qwen-2.5-7B/72B and Llama-3.1-8B/70B), while being up to 3$\times$ faster at inference. To further increase inference speed and reduce the memory required at inference time, we created Nemotron-H-47B-Base from the 56B model using a new compression via pruning and distillation technique called MiniPuzzle. Nemotron-H-47B-Base achieves similar accuracy to the 56B model, but is 20% faster to infer. In addition, we introduce an FP8-based training recipe and show that it can achieve on par results with BF16-based training. This recipe is used to train the 56B model. We are releasing Nemotron-H base model checkpoints with support in Hugging Face and NeMo.
△ Less
Submitted 15 April, 2025; v1 submitted 4 April, 2025;
originally announced April 2025.
-
SLIDE: Sliding Localized Information for Document Extraction
Authors:
Divyansh Singh,
Manuel Nunez Martinez,
Bonnie J. Dorr,
Sonja Schmer Galunder
Abstract:
Constructing accurate knowledge graphs from long texts and low-resource languages is challenging, as large language models (LLMs) experience degraded performance with longer input chunks. This problem is amplified in low-resource settings where data scarcity hinders accurate entity and relationship extraction. Contextual retrieval methods, while improving retrieval accuracy, struggle with long doc…
▽ More
Constructing accurate knowledge graphs from long texts and low-resource languages is challenging, as large language models (LLMs) experience degraded performance with longer input chunks. This problem is amplified in low-resource settings where data scarcity hinders accurate entity and relationship extraction. Contextual retrieval methods, while improving retrieval accuracy, struggle with long documents. They truncate critical information in texts exceeding maximum context lengths of LLMs, significantly limiting knowledge graph construction. We introduce SLIDE (Sliding Localized Information for Document Extraction), a chunking method that processes long documents by generating local context through overlapping windows. SLIDE ensures that essential contextual information is retained, enhancing knowledge graph extraction from documents exceeding LLM context limits. It significantly improves GraphRAG performance, achieving a 24% increase in entity extraction and a 39% improvement in relationship extraction for English. For Afrikaans, a low-resource language, SLIDE achieves a 49% increase in entity extraction and an 82% improvement in relationship extraction. Furthermore, it improves upon state-of-the-art in question-answering metrics such as comprehensiveness, diversity and empowerment, demonstrating its effectiveness in multilingual and resource-constrained settings.
△ Less
Submitted 23 March, 2025;
originally announced March 2025.
-
Analysis of Eccentric Coaxial Waveguides Filled with Lossy Anisotropic Media via Finite Difference
Authors:
Raul O. Ribeiro,
Maria A. Martinez,
Guilherme S. Rosa,
Rafael A. Penchel
Abstract:
This study presents a finite difference method (FDM) to model the electromagnetic field propagation in eccentric coaxial waveguides filled with lossy uniaxially anisotropic media. The formulation utilizes conformal transformation to map the eccentric circular waveguide into an equivalent concentric one. In the concentric problem, we introduce a novel normalized Helmholtz equation to decouple TM an…
▽ More
This study presents a finite difference method (FDM) to model the electromagnetic field propagation in eccentric coaxial waveguides filled with lossy uniaxially anisotropic media. The formulation utilizes conformal transformation to map the eccentric circular waveguide into an equivalent concentric one. In the concentric problem, we introduce a novel normalized Helmholtz equation to decouple TM and TE modes, and we solve this non-homogeneous partial differential equation using the finite difference in cylindrical coordinates. The proposed approach was validated against perturbation-based, spectral element-based, and finite-integration-based numerical solutions. The preliminary results show that our solution is superior in computational time. Furthermore, our FDM formulation can be extended with minimal adaptations to model complex media problems, such as metamaterial devices, optical fibers, and geophysical exploration sensors.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
Energy consumption of code small language models serving with runtime engines and execution providers
Authors:
Francisco Durán,
Matias Martinez,
Patricia Lago,
Silverio Martínez-Fernández
Abstract:
Background. The rapid growth of Language Models (LMs), particularly in code generation, requires substantial computational resources, raising concerns about energy consumption and environmental impact. Optimizing LMs inference for energy efficiency is crucial, and Small Language Models (SLMs) offer a promising solution to reduce resource demands.
Aim. Our goal is to analyze the impact of deep le…
▽ More
Background. The rapid growth of Language Models (LMs), particularly in code generation, requires substantial computational resources, raising concerns about energy consumption and environmental impact. Optimizing LMs inference for energy efficiency is crucial, and Small Language Models (SLMs) offer a promising solution to reduce resource demands.
Aim. Our goal is to analyze the impact of deep learning runtime engines and execution providers on energy consumption, execution time, and computing-resource utilization from the point of view of software engineers conducting inference in the context of code SLMs.
Method. We conducted a technology-oriented, multi-stage experimental pipeline using twelve code generation SLMs to investigate energy consumption, execution time, and computing-resource utilization across the configurations.
Results. Significant differences emerged across configurations. CUDA execution provider configurations outperformed CPU execution provider configurations in both energy consumption and execution time. Among the configurations, TORCH paired with CUDA demonstrated the greatest energy efficiency, achieving energy savings from 37.99% up to 89.16% compared to other serving configurations. Similarly, optimized runtime engines like ONNX with the CPU execution provider achieved from 8.98% up to 72.04% energy savings within CPU-based configurations. Also, TORCH paired with CUDA exhibited efficient computing-resource utilization.
Conclusions. Serving configuration choice significantly impacts energy efficiency. While further research is needed, we recommend the above configurations best suited to software engineers' requirements for enhancing serving efficiency in energy and performance.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Do Large Language Models Show Biases in Causal Learning?
Authors:
Maria Victoria Carro,
Francisca Gauna Selasco,
Denise Alejandra Mester,
Margarita Gonzales,
Mario A. Leiva,
Maria Vanina Martinez,
Gerardo I. Simari
Abstract:
Causal learning is the cognitive process of developing the capability of making causal inferences based on available information, often guided by normative principles. This process is prone to errors and biases, such as the illusion of causality, in which people perceive a causal relationship between two variables despite lacking supporting evidence. This cognitive bias has been proposed to underl…
▽ More
Causal learning is the cognitive process of developing the capability of making causal inferences based on available information, often guided by normative principles. This process is prone to errors and biases, such as the illusion of causality, in which people perceive a causal relationship between two variables despite lacking supporting evidence. This cognitive bias has been proposed to underlie many societal problems, including social prejudice, stereotype formation, misinformation, and superstitious thinking. In this research, we investigate whether large language models (LLMs) develop causal illusions, both in real-world and controlled laboratory contexts of causal learning and inference. To this end, we built a dataset of over 2K samples including purely correlational cases, situations with null contingency, and cases where temporal information excludes the possibility of causality by placing the potential effect before the cause. We then prompted the models to make statements or answer causal questions to evaluate their tendencies to infer causation erroneously in these structured settings. Our findings show a strong presence of causal illusion bias in LLMs. Specifically, in open-ended generation tasks involving spurious correlations, the models displayed bias at levels comparable to, or even lower than, those observed in similar studies on human subjects. However, when faced with null-contingency scenarios or temporal cues that negate causal relationships, where it was required to respond on a 0-100 scale, the models exhibited significantly higher bias. These findings suggest that the models have not uniformly, consistently, or reliably internalized the normative principles essential for accurate causal learning.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
Exploring Fully Convolutional Networks for the Segmentation of Hyperspectral Imaging Applied to Advanced Driver Assistance Systems
Authors:
Jon Gutiérrez-Zaballa,
Koldo Basterretxea,
Javier Echanobe,
M. Victoria Martínez,
Inés del Campo
Abstract:
Advanced Driver Assistance Systems (ADAS) are designed with the main purpose of increasing the safety and comfort of vehicle occupants. Most of current computer vision-based ADAS perform detection and tracking tasks quite successfully under regular conditions, but are not completely reliable, particularly under adverse weather and changing lighting conditions, neither in complex situations with ma…
▽ More
Advanced Driver Assistance Systems (ADAS) are designed with the main purpose of increasing the safety and comfort of vehicle occupants. Most of current computer vision-based ADAS perform detection and tracking tasks quite successfully under regular conditions, but are not completely reliable, particularly under adverse weather and changing lighting conditions, neither in complex situations with many overlapping objects. In this work we explore the use of hyperspectral imaging (HSI) in ADAS on the assumption that the distinct near infrared (NIR) spectral reflectances of different materials can help to better separate the objects in a driving scene. In particular, this paper describes some experimental results of the application of fully convolutional networks (FCN) to the image segmentation of HSI for ADAS applications. More specifically, our aim is to investigate to what extent the spatial features codified by convolutional filters can be helpful to improve the performance of HSI segmentation systems. With that aim, we use the HSI-Drive v1.1 dataset, which provides a set of labelled images recorded in real driving conditions with a small-size snapshot NIR-HSI camera. Finally, we analyze the implementability of such a HSI segmentation system by prototyping the developed FCN model together with the necessary hyperspectral cube preprocessing stage and characterizing its performance on an MPSoC.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
On-chip Hyperspectral Image Segmentation with Fully Convolutional Networks for Scene Understanding in Autonomous Driving
Authors:
Jon Gutiérrez-Zaballa,
Koldo Basterretxea,
Javier Echanobe,
M. Victoria Martínez,
Unai Martínez-Corral,
Óscar Mata Carballeira,
Inés del Campo
Abstract:
Most of current computer vision-based advanced driver assistance systems (ADAS) perform detection and tracking of objects quite successfully under regular conditions. However, under adverse weather and changing lighting conditions, and in complex situations with many overlapping objects, these systems are not completely reliable. The spectral reflectance of the different objects in a driving scene…
▽ More
Most of current computer vision-based advanced driver assistance systems (ADAS) perform detection and tracking of objects quite successfully under regular conditions. However, under adverse weather and changing lighting conditions, and in complex situations with many overlapping objects, these systems are not completely reliable. The spectral reflectance of the different objects in a driving scene beyond the visible spectrum can offer additional information to increase the reliability of these systems, especially under challenging driving conditions. Furthermore, this information may be significant enough to develop vision systems that allow for a better understanding and interpretation of the whole driving scene. In this work we explore the use of snapshot, video-rate hyperspectral imaging (HSI) cameras in ADAS on the assumption that the near infrared (NIR) spectral reflectance of different materials can help to better segment the objects in real driving scenarios. To do this, we have used the HSI-Drive 1.1 dataset to perform various experiments on spectral classification algorithms. However, the information retrieval of hyperspectral recordings in natural outdoor scenarios is challenging, mainly because of deficient colour constancy and other inherent shortcomings of current snapshot HSI technology, which poses some limitations to the development of pure spectral classifiers. In consequence, in this work we analyze to what extent the spatial features codified by standard, tiny fully convolutional network (FCN) models can improve the performance of HSI segmentation systems for ADAS applications.
The abstract above is truncated due to submission limits. For the full abstract, please refer to the published article.
△ Less
Submitted 28 November, 2024;
originally announced November 2024.
-
Rapid Deployment of Domain-specific Hyperspectral Image Processors with Application to Autonomous Driving
Authors:
Jon Gutiérrez-Zaballa,
Koldo Basterretxea,
Javier Echanobe,
Óscar Mata-Carballeira,
M. Victoria Martínez
Abstract:
The article discusses the use of low cost System-On-Module (SOM) platforms for the implementation of efficient hyperspectral imaging (HSI) processors for application in autonomous driving. The work addresses the challenges of shaping and deploying multiple layer fully convolutional networks (FCN) for low-latency, on-board image semantic segmentation using resource- and power-constrained processing…
▽ More
The article discusses the use of low cost System-On-Module (SOM) platforms for the implementation of efficient hyperspectral imaging (HSI) processors for application in autonomous driving. The work addresses the challenges of shaping and deploying multiple layer fully convolutional networks (FCN) for low-latency, on-board image semantic segmentation using resource- and power-constrained processing devices. The paper describes in detail the steps followed to redesign and customize a successfully trained HSI segmentation lightweight FCN that was previously tested on a high-end heterogeneous multiprocessing system-on-chip (MPSoC) to accommodate it to the constraints imposed by a low-cost SOM. This SOM features a lower-end but much cheaper MPSoC suitable for the deployment of automatic driving systems (ADS). In particular the article reports the data- and hardware-specific quantization techniques utilized to fit the FCN into a commercial fixed-point programmable AI coprocessor IP, and proposes a full customized post-training quantization scheme to reduce computation and storage costs without compromising segmentation accuracy.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
HSI-Drive v2.0: More Data for New Challenges in Scene Understanding for Autonomous Driving
Authors:
Jon Gutiérrez-Zaballa,
Koldo Basterretxea,
Javier Echanobe,
M. Victoria Martínez,
Unai Martínez-Corral
Abstract:
We present the updated version of the HSI-Drive dataset aimed at developing automated driving systems (ADS) using hyperspectral imaging (HSI). The v2.0 version includes new annotated images from videos recorded during winter and fall in real driving scenarios. Added to the spring and summer images included in the previous v1.1 version, the new dataset contains 752 images covering the four seasons.…
▽ More
We present the updated version of the HSI-Drive dataset aimed at developing automated driving systems (ADS) using hyperspectral imaging (HSI). The v2.0 version includes new annotated images from videos recorded during winter and fall in real driving scenarios. Added to the spring and summer images included in the previous v1.1 version, the new dataset contains 752 images covering the four seasons. In this paper, we show the improvements achieved over previously published results obtained on the v1.1 dataset, showcasing the enhanced performance of models trained on the new v2.0 dataset. We also show the progress made in comprehensive scene understanding by experimenting with more capable image segmentation models. These models include new segmentation categories aimed at the identification of essential road safety objects such as the presence of vehicles and road signs, as well as highly vulnerable groups like pedestrians and cyclists. In addition, we provide evidence of the performance and robustness of the models when applied to segmenting HSI video sequences captured in various environments and conditions. Finally, for a correct assessment of the results described in this work, the constraints imposed by the processing platforms that can sensibly be deployed in vehicles for ADS must be taken into account. Thus, and although implementation details are out of the scope of this paper, we focus our research on the development of computationally efficient, lightweight ML models that can eventually operate at high throughput rates. The dataset and some examples of segmented videos are available in https://ipaccess.ehu.eus/HSI-Drive/.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
Balancing Transparency and Accuracy: A Comparative Analysis of Rule-Based and Deep Learning Models in Political Bias Classification
Authors:
Manuel Nunez Martinez,
Sonja Schmer-Galunder,
Zoey Liu,
Sangpil Youm,
Chathuri Jayaweera,
Bonnie J. Dorr
Abstract:
The unchecked spread of digital information, combined with increasing political polarization and the tendency of individuals to isolate themselves from opposing political viewpoints, has driven researchers to develop systems for automatically detecting political bias in media. This trend has been further fueled by discussions on social media. We explore methods for categorizing bias in US news art…
▽ More
The unchecked spread of digital information, combined with increasing political polarization and the tendency of individuals to isolate themselves from opposing political viewpoints, has driven researchers to develop systems for automatically detecting political bias in media. This trend has been further fueled by discussions on social media. We explore methods for categorizing bias in US news articles, comparing rule-based and deep learning approaches. The study highlights the sensitivity of modern self-learning systems to unconstrained data ingestion, while reconsidering the strengths of traditional rule-based systems. Applying both models to left-leaning (CNN) and right-leaning (FOX) news articles, we assess their effectiveness on data beyond the original training and test sets.This analysis highlights each model's accuracy, offers a framework for exploring deep-learning explainability, and sheds light on political bias in US news media. We contrast the opaque architecture of a deep learning model with the transparency of a linguistically informed rule-based model, showing that the rule-based model performs consistently across different data conditions and offers greater transparency, whereas the deep learning model is dependent on the training set and struggles with unseen data.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
CELI: Controller-Embedded Language Model Interactions
Authors:
Jan-Samuel Wagner,
Dave DeCaprio,
Abishek Chiffon Muthu Raja,
Jonathan M. Holman,
Lauren K. Brady,
Sky C. Cheung,
Hosein Barzekar,
Eric Yang,
Mark Anthony Martinez II,
David Soong,
Sriram Sridhar,
Han Si,
Brandon W. Higgs,
Hisham Hamadeh,
Scott Ogden
Abstract:
We introduce Controller-Embedded Language Model Interactions (CELI), a framework that integrates control logic directly within language model (LM) prompts, facilitating complex, multi-stage task execution. CELI addresses limitations of existing prompt engineering and workflow optimization techniques by embedding control logic directly within the operational context of language models, enabling dyn…
▽ More
We introduce Controller-Embedded Language Model Interactions (CELI), a framework that integrates control logic directly within language model (LM) prompts, facilitating complex, multi-stage task execution. CELI addresses limitations of existing prompt engineering and workflow optimization techniques by embedding control logic directly within the operational context of language models, enabling dynamic adaptation to evolving task requirements. Our framework transfers control from the traditional programming execution environment to the LMs, allowing them to autonomously manage computational workflows while maintaining seamless interaction with external systems and functions. CELI supports arbitrary function calls with variable arguments, bridging the gap between LMs' adaptive reasoning capabilities and conventional software paradigms' structured control mechanisms. To evaluate CELI's versatility and effectiveness, we conducted case studies in two distinct domains: code generation (HumanEval benchmark) and multi-stage content generation (Wikipedia-style articles). The results demonstrate notable performance improvements across a range of domains. CELI achieved a 4.9 percentage point improvement over the best reported score of the baseline GPT-4 model on the HumanEval code generation benchmark. In multi-stage content generation, 94.4% of CELI-produced Wikipedia-style articles met or exceeded first draft quality when optimally configured, with 44.4% achieving high quality. These outcomes underscore CELI's potential for optimizing AI-driven workflows across diverse computational domains.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
The formula for the completion time of project networks
Authors:
Manuel Castejón-Limas,
Gabriel Medina Martínez,
Virginia Riego del Castillo,
Laura Fernández-Robles
Abstract:
This paper formulates the completion time $τ$ of a project network as $ τ=\|\mathbf{R} \mathbf{t} \|_\infty $ where the rows of $\mathbf{R}$ are simple paths of the network and $\mathbf{t}$ is a column vector representing the duration of the activities. Considering this product as a linear transformation leads to interesting findings on the topological relevance of both paths and activities using…
▽ More
This paper formulates the completion time $τ$ of a project network as $ τ=\|\mathbf{R} \mathbf{t} \|_\infty $ where the rows of $\mathbf{R}$ are simple paths of the network and $\mathbf{t}$ is a column vector representing the duration of the activities. Considering this product as a linear transformation leads to interesting findings on the topological relevance of both paths and activities using singular value decomposition. The notion of spectral networks is introduced to condense the fundamental structure of the project network. A definition of project stress is introduced to establish a comparison index between two alternatives in terms of slack. Additionally, the Moore-Penrose inverse of $\mathbf{R}$ is presented to find the configuration of the durations of the activities resulting in a given simple path duration vector. Then, the systematic mapping review process carried out to assess our claims' novelty is reported. Finally, we reflect on the notion of relevance for paths and activities and the relationship of the incidence matrix with the proposed approach.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
Authors:
Leo McKee-Reid,
Christoph Sträter,
Maria Angelica Martinez,
Joe Needham,
Mikita Balesni
Abstract:
Previous work has shown that training "helpful-only" LLMs with reinforcement learning on a curriculum of gameable environments can lead models to generalize to egregious specification gaming, such as editing their own reward function or modifying task checklists to appear more successful. We show that gpt-4o, gpt-4o-mini, o1-preview, and o1-mini - frontier models trained to be helpful, harmless, a…
▽ More
Previous work has shown that training "helpful-only" LLMs with reinforcement learning on a curriculum of gameable environments can lead models to generalize to egregious specification gaming, such as editing their own reward function or modifying task checklists to appear more successful. We show that gpt-4o, gpt-4o-mini, o1-preview, and o1-mini - frontier models trained to be helpful, harmless, and honest - can engage in specification gaming without training on a curriculum of tasks, purely from in-context iterative reflection (which we call in-context reinforcement learning, "ICRL"). We also show that using ICRL to generate highly-rewarded outputs for expert iteration (compared to the standard expert iteration reinforcement learning algorithm) may increase gpt-4o-mini's propensity to learn specification-gaming policies, generalizing (in very rare cases) to the most egregious strategy where gpt-4o-mini edits its own reward function. Our results point toward the strong ability of in-context reflection to discover rare specification-gaming strategies that models might not exhibit zero-shot or with normal training, highlighting the need for caution when relying on alignment of LLMs in zero-shot settings.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks
Authors:
Roberto Alcover-Couso,
Juan C. SanMiguel,
Marcos Escudero-Viñolo,
Jose M Martínez
Abstract:
Merging parameters of multiple models has resurfaced as an effective strategy to enhance task performance and robustness, but prior work is limited by the high costs of ensemble creation and inference. In this paper, we leverage the abundance of freely accessible trained models to introduce a cost-free approach to model merging. It focuses on a layer-wise integration of merged models, aiming to ma…
▽ More
Merging parameters of multiple models has resurfaced as an effective strategy to enhance task performance and robustness, but prior work is limited by the high costs of ensemble creation and inference. In this paper, we leverage the abundance of freely accessible trained models to introduce a cost-free approach to model merging. It focuses on a layer-wise integration of merged models, aiming to maintain the distinctiveness of the task-specific final layers while unifying the initial layers, which are primarily associated with feature extraction. This approach ensures parameter consistency across all layers, essential for boosting performance. Moreover, it facilitates seamless integration of knowledge, enabling effective merging of models from different datasets and tasks. Specifically, we investigate its applicability in Unsupervised Domain Adaptation (UDA), an unexplored area for model merging, for Semantic and Panoptic Segmentation. Experimental results demonstrate substantial UDA improvements without additional costs for merging same-architecture models from distinct datasets ($\uparrow 2.6\%$ mIoU) and different-architecture models with a shared backbone ($\uparrow 6.8\%$ mIoU). Furthermore, merging Semantic and Panoptic Segmentation models increases mPQ by $\uparrow 7\%$. These findings are validated across a wide variety of UDA strategies, architectures, and datasets.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
K-Origins: Better Colour Quantification for Neural Networks
Authors:
Lewis Mason,
Mark Martinez
Abstract:
K-Origins is a neural network layer designed to improve image-based network performances when learning colour, or intensities, is beneficial. Over 250 encoder-decoder convolutional networks are trained and tested on 16-bit synthetic data, demonstrating that K-Origins improves semantic segmentation accuracy in two scenarios: object detection with low signal-to-noise ratios, and segmenting multiple…
▽ More
K-Origins is a neural network layer designed to improve image-based network performances when learning colour, or intensities, is beneficial. Over 250 encoder-decoder convolutional networks are trained and tested on 16-bit synthetic data, demonstrating that K-Origins improves semantic segmentation accuracy in two scenarios: object detection with low signal-to-noise ratios, and segmenting multiple objects that are identical in shape but vary in colour. K-Origins generates output features from the input features, $\textbf{X}$, by the equation $\textbf{Y}_k = \textbf{X}-\textbf{J}\cdot w_k$ for each trainable parameter $w_k$, where $\textbf{J}$ is a matrix of ones. Additionally, networks with varying receptive fields were trained to determine optimal network depths based on the dimensions of target classes, suggesting that receptive field lengths should exceed object sizes. By ensuring a sufficient receptive field length and incorporating K-Origins, we can achieve better semantic network performance.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
Advancing Interactive Explainable AI via Belief Change Theory
Authors:
Antonio Rago,
Maria Vanina Martinez
Abstract:
As AI models become ever more complex and intertwined in humans' daily lives, greater levels of interactivity of explainable AI (XAI) methods are needed. In this paper, we propose the use of belief change theory as a formal foundation for operators that model the incorporation of new information, i.e. user feedback in interactive XAI, to logical representations of data-driven classifiers. We argue…
▽ More
As AI models become ever more complex and intertwined in humans' daily lives, greater levels of interactivity of explainable AI (XAI) methods are needed. In this paper, we propose the use of belief change theory as a formal foundation for operators that model the incorporation of new information, i.e. user feedback in interactive XAI, to logical representations of data-driven classifiers. We argue that this type of formalisation provides a framework and a methodology to develop interactive explanations in a principled manner, providing warranted behaviour and favouring transparency and accountability of such interactions. Concretely, we first define a novel, logic-based formalism to represent explanatory information shared between humans and machines. We then consider real world scenarios for interactive XAI, with different prioritisations of new and existing knowledge, where our formalism may be instantiated. Finally, we analyse a core set of belief change postulates, discussing their suitability for our real world settings and pointing to particular challenges that may require the relaxation or reinterpretation of some of the theoretical assumptions underlying existing operators.
△ Less
Submitted 14 August, 2024; v1 submitted 13 August, 2024;
originally announced August 2024.
-
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines
Authors:
Matias Martinez
Abstract:
The recent surge of open-source large language models (LLMs) enables developers to create AI-based solutions while maintaining control over aspects such as privacy and compliance, thereby providing governance and ownership of the model deployment process. To utilize these LLMs, inference engines are needed. These engines load the model's weights onto available resources, such as GPUs, and process…
▽ More
The recent surge of open-source large language models (LLMs) enables developers to create AI-based solutions while maintaining control over aspects such as privacy and compliance, thereby providing governance and ownership of the model deployment process. To utilize these LLMs, inference engines are needed. These engines load the model's weights onto available resources, such as GPUs, and process queries to generate responses. The speed of inference, or performance, of the LLM, is critical for real-time applications, as it computes millions or billions of floating point operations per inference. Recently, advanced inference engines such as vLLM have emerged, incorporating novel mechanisms such as efficient memory management to achieve state-of-the-art performance. In this paper, we analyze the performance, particularly the throughput (tokens generated per unit of time), of 20 LLMs using two inference libraries: vLLM and HuggingFace's pipelines. We investigate how various hyperparameters, which developers must configure, influence inference performance. Our results reveal that throughput landscapes are irregular, with distinct peaks, highlighting the importance of hyperparameter optimization to achieve maximum performance. We also show that applying hyperparameter optimization when upgrading or downgrading the GPU model used for inference can improve throughput from HuggingFace pipelines by an average of 9.16% and 13.7%, respectively.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
Nemotron-4 340B Technical Report
Authors:
Nvidia,
:,
Bo Adler,
Niket Agarwal,
Ashwath Aithal,
Dong H. Anh,
Pallab Bhattacharya,
Annika Brundyn,
Jared Casper,
Bryan Catanzaro,
Sharon Clay,
Jonathan Cohen,
Sirshak Das,
Ayush Dattagupta,
Olivier Delalleau,
Leon Derczynski,
Yi Dong,
Daniel Egert,
Ellie Evans,
Aleksander Ficek,
Denys Fridman,
Shaona Ghosh,
Boris Ginsburg,
Igor Gitman,
Tomasz Grzegorzek
, et al. (58 additional authors not shown)
Abstract:
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be…
▽ More
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process.
△ Less
Submitted 6 August, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Unlearning Information Bottleneck: Machine Unlearning of Systematic Patterns and Biases
Authors:
Ling Han,
Hao Huang,
Dustin Scheinost,
Mary-Anne Hartley,
María Rodríguez Martínez
Abstract:
Effective adaptation to distribution shifts in training data is pivotal for sustaining robustness in neural networks, especially when removing specific biases or outdated information, a process known as machine unlearning. Traditional approaches typically assume that data variations are random, which makes it difficult to adjust the model parameters accurately to remove patterns and characteristic…
▽ More
Effective adaptation to distribution shifts in training data is pivotal for sustaining robustness in neural networks, especially when removing specific biases or outdated information, a process known as machine unlearning. Traditional approaches typically assume that data variations are random, which makes it difficult to adjust the model parameters accurately to remove patterns and characteristics from unlearned data. In this work, we present Unlearning Information Bottleneck (UIB), a novel information-theoretic framework designed to enhance the process of machine unlearning that effectively leverages the influence of systematic patterns and biases for parameter adjustment. By proposing a variational upper bound, we recalibrate the model parameters through a dynamic prior that integrates changes in data distribution with an affordable computational cost, allowing efficient and accurate removal of outdated or unwanted data patterns and biases. Our experiments across various datasets, models, and unlearning methods demonstrate that our approach effectively removes systematic patterns and biases while maintaining the performance of models post-unlearning.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Optimization of resources for digital radio transmission over IBOC FM through max-min fairness
Authors:
Mónica Rico Martínez,
Juan Carlos Vesga Ferreira,
Joel Carroll Vargas,
María Consuelo Rodríguez Niño,
Andrés Alejandro Diaz Toro,
William Alexander Cuevas Carrero
Abstract:
The equitable distribution of resources in a network is a complex process, considering that not all nodes have the same requirements, and the In-Band On-Channel (IBOC) hybrid transmission system is no exception. The IBOC system utilizes a hybrid in-band transmission to simultaneously broadcast analog and digital audio over the FM band. This article proposes the use of a Max-Min Fairness (MMF) algo…
▽ More
The equitable distribution of resources in a network is a complex process, considering that not all nodes have the same requirements, and the In-Band On-Channel (IBOC) hybrid transmission system is no exception. The IBOC system utilizes a hybrid in-band transmission to simultaneously broadcast analog and digital audio over the FM band. This article proposes the use of a Max-Min Fairness (MMF) algorithm, with a strategy to optimize resource allocation for IBOC FM transmission in a multiservice scenario. Additionally, the MMF algorithm offers low computational complexity for implementation in low-cost embedded systems, aiming to achieve fair resource distribution and provide adequate Quality of Service (QoS) levels for each node in the RF network, considering channel conditions and traffic types. The article explores a scenario under saturated traffic conditions to assess the optimization capabilities of the MMF algorithm under well-defined traffic and channel conditions. The evaluation process yielded highly favorable results, indicating that theMMF algorithm can be considered a viable alternative for bandwidth optimization in digital broadcasting over IBOC on FM with 95% confidence, and it holds potential for implementation in other digital broadcasting system.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
Authors:
Pablo Marcos-Manchón,
Roberto Alcover-Couso,
Juan C. SanMiguel,
Jose M. Martínez
Abstract:
Diffusion models represent a new paradigm in text-to-image generation. Beyond generating high-quality images from text prompts, models such as Stable Diffusion have been successfully extended to the joint generation of semantic segmentation pseudo-masks. However, current extensions primarily rely on extracting attentions linked to prompt words used for image synthesis. This approach limits the gen…
▽ More
Diffusion models represent a new paradigm in text-to-image generation. Beyond generating high-quality images from text prompts, models such as Stable Diffusion have been successfully extended to the joint generation of semantic segmentation pseudo-masks. However, current extensions primarily rely on extracting attentions linked to prompt words used for image synthesis. This approach limits the generation of segmentation masks derived from word tokens not contained in the text prompt. In this work, we introduce Open-Vocabulary Attention Maps (OVAM)-a training-free method for text-to-image diffusion models that enables the generation of attention maps for any word. In addition, we propose a lightweight optimization process based on OVAM for finding tokens that generate accurate attention maps for an object class with a single annotation. We evaluate these tokens within existing state-of-the-art Stable Diffusion extensions. The best-performing model improves its mIoU from 52.1 to 86.6 for the synthetic images' pseudo-masks, demonstrating that our optimized tokens are an efficient way to improve the performance of existing methods without architectural changes or retraining.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Nemotron-4 15B Technical Report
Authors:
Jupinder Parmar,
Shrimai Prabhumoye,
Joseph Jennings,
Mostofa Patwary,
Sandeep Subramanian,
Dan Su,
Chen Zhu,
Deepak Narayanan,
Aastha Jhunjhunwala,
Ayush Dattagupta,
Vibhu Jawa,
Jiwei Liu,
Ameya Mahabaleshwarkar,
Osvald Nitski,
Annika Brundyn,
James Maki,
Miguel Martinez,
Jiaxuan You,
John Kamalu,
Patrick LeGresley,
Denys Fridman,
Jared Casper,
Ashwath Aithal,
Oleksii Kuchaiev,
Mohammad Shoeybi
, et al. (2 additional authors not shown)
Abstract:
We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remai…
▽ More
We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.
△ Less
Submitted 27 February, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Computational Complexity of Preferred Subset Repairs on Data-Graphs
Authors:
Nina Pardal,
Santiago Cifuentes,
Edwin Pin,
Maria Vanina Martinez,
Sergio Abriola
Abstract:
Preferences are a pivotal component in practical reasoning, especially in tasks that involve decision-making over different options or courses of action that could be pursued. In this work, we focus on repairing and querying inconsistent knowledge bases in the form of graph databases, which involves finding a way to solve conflicts in the knowledge base and considering answers that are entailed fr…
▽ More
Preferences are a pivotal component in practical reasoning, especially in tasks that involve decision-making over different options or courses of action that could be pursued. In this work, we focus on repairing and querying inconsistent knowledge bases in the form of graph databases, which involves finding a way to solve conflicts in the knowledge base and considering answers that are entailed from every possible repair, respectively. Without a priori domain knowledge, all possible repairs are equally preferred. Though that may be adequate for some settings, it seems reasonable to establish and exploit some form of preference order among the potential repairs. We study the problem of computing prioritized repairs over graph databases with data values, using a notion of consistency based on GXPath expressions as integrity constraints. We present several preference criteria based on the standard subset repair semantics, incorporating weights, multisets, and set-based priority levels. We show that it is possible to maintain the same computational complexity as in the case where no preference criterion is available for exploitation. Finally, we explore the complexity of consistent query answering in this setting and obtain tight lower and upper bounds for all the preference criteria introduced.
△ Less
Submitted 27 May, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Identifying architectural design decisions for achieving green ML serving
Authors:
Francisco Durán,
Silverio Martínez-Fernández,
Matias Martinez,
Patricia Lago
Abstract:
The growing use of large machine learning models highlights concerns about their increasing computational demands. While the energy consumption of their training phase has received attention, fewer works have considered the inference phase. For ML inference, the binding of ML models to the ML system for user access, known as ML serving, is a critical yet understudied step for achieving efficiency…
▽ More
The growing use of large machine learning models highlights concerns about their increasing computational demands. While the energy consumption of their training phase has received attention, fewer works have considered the inference phase. For ML inference, the binding of ML models to the ML system for user access, known as ML serving, is a critical yet understudied step for achieving efficiency in ML applications.
We examine the literature in ML architectural design decisions and Green AI, with a special focus on ML serving. The aim is to analyze ML serving architectural design decisions for the purpose of understanding and identifying them with respect to quality characteristics from the point of view of researchers and practitioners in the context of ML serving literature.
Our results (i) identify ML serving architectural design decisions along with their corresponding components and associated technological stack, and (ii) provide an overview of the quality characteristics studied in the literature, including energy efficiency.
This preliminary study is the first step in our goal to achieve green ML serving. Our analysis may aid ML researchers and practitioners in making green-aware architecture design decisions when serving their models.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
The Distributional Uncertainty of the SHAP score in Explainable Machine Learning
Authors:
Santiago Cifuentes,
Leopoldo Bertossi,
Nina Pardal,
Sergio Abriola,
Maria Vanina Martinez,
Miguel Romero
Abstract:
Attribution scores reflect how important the feature values in an input entity are for the output of a machine learning model. One of the most popular attribution scores is the SHAP score, which is an instantiation of the general Shapley value used in coalition game theory. The definition of this score relies on a probability distribution on the entity population. Since the exact distribution is g…
▽ More
Attribution scores reflect how important the feature values in an input entity are for the output of a machine learning model. One of the most popular attribution scores is the SHAP score, which is an instantiation of the general Shapley value used in coalition game theory. The definition of this score relies on a probability distribution on the entity population. Since the exact distribution is generally unknown, it needs to be assigned subjectively or be estimated from data, which may lead to misleading feature scores. In this paper, we propose a principled framework for reasoning on SHAP scores under unknown entity population distributions. In our framework, we consider an uncertainty region that contains the potential distributions, and the SHAP score of a feature becomes a function defined over this region. We study the basic problems of finding maxima and minima of this function, which allows us to determine tight ranges for the SHAP scores of all features. In particular, we pinpoint the complexity of these problems, and other related ones, showing them to be NP-complete. Finally, we present experiments on a real-world dataset, showing that our framework may contribute to a more robust feature scoring.
△ Less
Submitted 13 August, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Comparison of metaheuristics for the firebreak placement problem: a simulation-based optimization approach
Authors:
David Palacios-Meneses,
Jaime Carrasco,
Sebastián Dávila,
Maximiliano Martínez,
Rodrigo Mahaluf,
Andrés Weintraub
Abstract:
The problem of firebreak placement is crucial for fire prevention, and its effectiveness at landscape scale will depend on their ability to impede the progress of future wildfires. To provide an adequate response, it is therefore necessary to consider the stochastic nature of fires, which are highly unpredictable from ignition to extinction. Thus, the placement of firebreaks can be considered a st…
▽ More
The problem of firebreak placement is crucial for fire prevention, and its effectiveness at landscape scale will depend on their ability to impede the progress of future wildfires. To provide an adequate response, it is therefore necessary to consider the stochastic nature of fires, which are highly unpredictable from ignition to extinction. Thus, the placement of firebreaks can be considered a stochastic optimization problem where: (1) the objective function is to minimize the expected cells burnt of the landscape; (2) the decision variables being the location of firebreaks; and (3) the random variable being the spatial propagation/behavior of fires. In this paper, we propose a solution approach for the problem from the perspective of simulation-based optimization (SbO), where the objective function is not available (a black-box function), but can be computed (and/or approximated) by wildfire simulations. For this purpose, Genetic Algorithm and GRASP are implemented. The final implementation yielded favorable results for the Genetic Algorithm, demonstrating strong performance in scenarios with medium to high operational capacity, as well as medium levels of stochasticity
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Beyond Certificates: 6G-ready Access Control for the Service-Based Architecture with Decentralized Identifiers and Verifiable Credentials
Authors:
Sandro Rodriguez Garzon,
Hai Dinh Tuan,
Maria Mora Martinez,
Axel Küpper,
Hans Joachim Einsiedler,
Daniela Schneider
Abstract:
Next generation mobile networks are poised to transition from monolithic structures owned and operated by single mobile network operators into multi-stakeholder networks where various parties contribute with infrastructure, resources, and services. However, a federation of networks and services brings along a crucial challenge: Guaranteeing secure and trustworthy access control among network entit…
▽ More
Next generation mobile networks are poised to transition from monolithic structures owned and operated by single mobile network operators into multi-stakeholder networks where various parties contribute with infrastructure, resources, and services. However, a federation of networks and services brings along a crucial challenge: Guaranteeing secure and trustworthy access control among network entities of different administrative domains. This paper introduces a novel technical concept and a prototype, outlining and implementing a 5G Service-Based Architecture that utilizes Decentralized Identifiers and Verifiable Credentials instead of traditional X.509 certificates and OAuth2.0 access tokens to authenticate and authorize network functions among each other across administrative domains. This decentralized approach to identity and permission management for network functions reduces the risk of single points of failure associated with centralized public key infrastructures. It unifies access control mechanisms and lays the groundwork for lesser complex and more trustful cross-domain key management for highly collaborative network functions in a multi-party Service-Based Architecture of 6G.
△ Less
Submitted 23 February, 2024; v1 submitted 30 October, 2023;
originally announced October 2023.
-
On the Feasibility of Cross-Language Detection of Malicious Packages in npm and PyPI
Authors:
Piergiorgio Ladisa,
Serena Elisa Ponta,
Nicola Ronzoni,
Matias Martinez,
Olivier Barais
Abstract:
Current software supply chains heavily rely on open-source packages hosted in public repositories. Given the popularity of ecosystems like npm and PyPI, malicious users started to spread malware by publishing open-source packages containing malicious code. Recent works apply machine learning techniques to detect malicious packages in the npm ecosystem. However, the scarcity of samples poses a chal…
▽ More
Current software supply chains heavily rely on open-source packages hosted in public repositories. Given the popularity of ecosystems like npm and PyPI, malicious users started to spread malware by publishing open-source packages containing malicious code. Recent works apply machine learning techniques to detect malicious packages in the npm ecosystem. However, the scarcity of samples poses a challenge to the application of machine learning techniques in other ecosystems. Despite the differences between JavaScript and Python, the open-source software supply chain attacks targeting such languages show noticeable similarities (e.g., use of installation scripts, obfuscated strings, URLs).
In this paper, we present a novel approach that involves a set of language-independent features and the training of models capable of detecting malicious packages in npm and PyPI by capturing their commonalities. This methodology allows us to train models on a diverse dataset encompassing multiple languages, thereby overcoming the challenge of limited sample availability. We evaluate the models both in a controlled experiment (where labels of data are known) and in the wild by scanning newly uploaded packages for both npm and PyPI for 10 days.
We find that our approach successfully detects malicious packages for both npm and PyPI. Over an analysis of 31,292 packages, we reported 58 previously unknown malicious packages (38 for npm and 20 for PyPI), which were consequently removed from the respective repositories.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
DockGame: Cooperative Games for Multimeric Rigid Protein Docking
Authors:
Vignesh Ram Somnath,
Pier Giuseppe Sessa,
Maria Rodriguez Martinez,
Andreas Krause
Abstract:
Protein interactions and assembly formation are fundamental to most biological processes. Predicting the assembly structure from constituent proteins -- referred to as the protein docking task -- is thus a crucial step in protein design applications. Most traditional and deep learning methods for docking have focused mainly on binary docking, following either a search-based, regression-based, or g…
▽ More
Protein interactions and assembly formation are fundamental to most biological processes. Predicting the assembly structure from constituent proteins -- referred to as the protein docking task -- is thus a crucial step in protein design applications. Most traditional and deep learning methods for docking have focused mainly on binary docking, following either a search-based, regression-based, or generative modeling paradigm. In this paper, we focus on the less-studied multimeric (i.e., two or more proteins) docking problem. We introduce DockGame, a novel game-theoretic framework for docking -- we view protein docking as a cooperative game between proteins, where the final assembly structure(s) constitute stable equilibria w.r.t. the underlying game potential. Since we do not have access to the true potential, we consider two approaches - i) learning a surrogate game potential guided by physics-based energy functions and computing equilibria by simultaneous gradient updates, and ii) sampling from the Gibbs distribution of the true potential by learning a diffusion generative model over the action spaces (rotations and translations) of all proteins. Empirically, on the Docking Benchmark 5.5 (DB5.5) dataset, DockGame has much faster runtimes than traditional docking methods, can generate multiple plausible assembly structures, and achieves comparable performance to existing binary docking baselines, despite solving the harder task of coordinating multiple protein chains.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Conformal Autoregressive Generation: Beam Search with Coverage Guarantees
Authors:
Nicolas Deutschmann,
Marvin Alberts,
María Rodríguez Martínez
Abstract:
We introduce two new extensions to the beam search algorithm based on conformal predictions (CP) to produce sets of sequences with theoretical coverage guarantees. The first method is very simple and proposes dynamically-sized subsets of beam search results but, unlike typical CP procedures, has an upper bound on the achievable guarantee depending on a post-hoc calibration measure. Our second algo…
▽ More
We introduce two new extensions to the beam search algorithm based on conformal predictions (CP) to produce sets of sequences with theoretical coverage guarantees. The first method is very simple and proposes dynamically-sized subsets of beam search results but, unlike typical CP procedures, has an upper bound on the achievable guarantee depending on a post-hoc calibration measure. Our second algorithm introduces the conformal set prediction procedure as part of the decoding process, producing a variable beam width which adapts to the current uncertainty. While more complex, this procedure can achieve coverage guarantees selected a priori. We provide marginal coverage bounds for each method, and evaluate them empirically on a selection of tasks drawing from natural language processing and chemistry.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
A quantum algorithm for track reconstruction in the LHCb vertex detector
Authors:
Davide Nicotra,
Miriam Lucio Martinez,
Jacco Andreas de Vries,
Marcel Merk,
Kurt Driessens,
Ronald Leonard Westra,
Domenica Dibenedetto,
Daniel Hugo Cámpora Pérez
Abstract:
High-energy physics is facing increasingly computational challenges in real-time event reconstruction for the near-future high-luminosity era. Using the LHCb vertex detector as a use-case, we explore a new algorithm for particle track reconstruction based on the minimisation of an Ising-like Hamiltonian with a linear algebra approach. The use of a classical matrix inversion technique results in tr…
▽ More
High-energy physics is facing increasingly computational challenges in real-time event reconstruction for the near-future high-luminosity era. Using the LHCb vertex detector as a use-case, we explore a new algorithm for particle track reconstruction based on the minimisation of an Ising-like Hamiltonian with a linear algebra approach. The use of a classical matrix inversion technique results in tracking performance similar to the current state-of-the-art but with worse scaling complexity in time. To solve this problem, we also present an implementation as quantum algorithm, using the Harrow-Hassadim-Lloyd (HHL) algorithm: this approach can potentially provide an exponential speedup as a function of the number of input hits over its classical counterpart, in spite of limitations due to the well-known HHL Hamiltonian simulation and readout problems. The findings presented in this paper shed light on the potential of leveraging quantum computing for real-time particle track reconstruction in high-energy physics.
△ Less
Submitted 17 October, 2023; v1 submitted 1 August, 2023;
originally announced August 2023.
-
The Hitchhiker's Guide to Malicious Third-Party Dependencies
Authors:
Piergiorgio Ladisa,
Merve Sahin,
Serena Elisa Ponta,
Marco Rosa,
Matias Martinez,
Olivier Barais
Abstract:
The increasing popularity of certain programming languages has spurred the creation of ecosystem-specific package repositories and package managers. Such repositories (e.g., npm, PyPI) serve as public databases that users can query to retrieve packages for various functionalities, whereas package managers automatically handle dependency resolution and package installation on the client side. These…
▽ More
The increasing popularity of certain programming languages has spurred the creation of ecosystem-specific package repositories and package managers. Such repositories (e.g., npm, PyPI) serve as public databases that users can query to retrieve packages for various functionalities, whereas package managers automatically handle dependency resolution and package installation on the client side. These mechanisms enhance software modularization and accelerate implementation. However, they have become a target for malicious actors seeking to propagate malware on a large scale.
In this work, we show how attackers can leverage capabilities of popular package managers and languages to achieve arbitrary code execution on victim machines, thereby realizing open-source software supply chain attacks. Based on the analysis of 7 ecosystems, we identify 3 install-time and 4 runtime techniques, and we provide recommendations describing how to reduce the risk when consuming third-party dependencies. We will provide proof-of-concepts that demonstrate the identified techniques. Furthermore, we describe evasion strategies employed by attackers to circumvent detection mechanisms.
△ Less
Submitted 6 October, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
An Analysis of Dialogue Repair in Virtual Voice Assistants
Authors:
Matthew Carson Galbraith,
Mireia Gómez i Martínez
Abstract:
Language speakers often use what are known as repair initiators to mend fundamental disconnects that occur between them during verbal communication. Previous research in this field has mainly focused on the human-to-human use of repair initiator. We proposed an examination of dialogue repair structure wherein the dialogue initiator is human and the party that initiates or responds to the repair is…
▽ More
Language speakers often use what are known as repair initiators to mend fundamental disconnects that occur between them during verbal communication. Previous research in this field has mainly focused on the human-to-human use of repair initiator. We proposed an examination of dialogue repair structure wherein the dialogue initiator is human and the party that initiates or responds to the repair is a virtual assistant. This study examined the use of repair initiators in both English and Spanish with two popular assistants, Google Assistant and Apple's Siri. Our aim was to codify the differences, if any, in responses by voice assistants to dialogues in need of repair as compared to human-human dialogues also in need of repair. Ultimately the data demonstrated that not only were there differences between human-assistant and human-human dialogue repair strategies, but that there were likewise differences among the assistants and the languages studied.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Adaptive Conformal Regression with Jackknife+ Rescaled Scores
Authors:
Nicolas Deutschmann,
Mattia Rigotti,
Maria Rodriguez Martinez
Abstract:
Conformal regression provides prediction intervals with global coverage guarantees, but often fails to capture local error distributions, leading to non-homogeneous coverage. We address this with a new adaptive method based on rescaling conformal scores with an estimate of local score distribution, inspired by the Jackknife+ method, which enables the use of calibration data in conformal scores wit…
▽ More
Conformal regression provides prediction intervals with global coverage guarantees, but often fails to capture local error distributions, leading to non-homogeneous coverage. We address this with a new adaptive method based on rescaling conformal scores with an estimate of local score distribution, inspired by the Jackknife+ method, which enables the use of calibration data in conformal scores without breaking calibration-test exchangeability. Our approach ensures formal global coverage guarantees and is supported by new theoretical results on local coverage, including an a posteriori bound on any calibration score. The strength of our approach lies in achieving local coverage without sacrificing calibration set size, improving the applicability of conformal prediction intervals in various settings. As a result, our method provides prediction intervals that outperform previous methods, particularly in the low-data regime, making it especially relevant for real-world applications such as healthcare and biomedical domains where uncertainty needs to be quantified accurately despite low sample data.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Journey to the Center of Software Supply Chain Attacks
Authors:
Piergiorgio Ladisa,
Serena Elisa Ponta,
Antonino Sabetta,
Matias Martinez,
Olivier Barais
Abstract:
This work discusses open-source software supply chain attacks and proposes a general taxonomy describing how attackers conduct them. We then provide a list of safeguards to mitigate such attacks. We present our tool "Risk Explorer for Software Supply Chains" to explore such information and we discuss its industrial use-cases.
This work discusses open-source software supply chain attacks and proposes a general taxonomy describing how attackers conduct them. We then provide a list of safeguards to mitigate such attacks. We present our tool "Risk Explorer for Software Supply Chains" to explore such information and we discuss its industrial use-cases.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Soft labelling for semantic segmentation: Bringing coherence to label down-sampling
Authors:
Roberto Alcover-Couso,
Marcos Escudero-Vinolo,
Juan C. SanMiguel,
Jose M. Martinez
Abstract:
In semantic segmentation, training data down-sampling is commonly performed due to limited resources, the need to adapt image size to the model input, or improve data augmentation. This down-sampling typically employs different strategies for the image data and the annotated labels. Such discrepancy leads to mismatches between the down-sampled color and label images. Hence, the training performanc…
▽ More
In semantic segmentation, training data down-sampling is commonly performed due to limited resources, the need to adapt image size to the model input, or improve data augmentation. This down-sampling typically employs different strategies for the image data and the annotated labels. Such discrepancy leads to mismatches between the down-sampled color and label images. Hence, the training performance significantly decreases as the down-sampling factor increases. In this paper, we bring together the down-sampling strategies for the image data and the training labels. To that aim, we propose a novel framework for label down-sampling via soft-labeling that better conserves label information after down-sampling. Therefore, fully aligning soft-labels with image data to keep the distribution of the sampled pixels. This proposal also produces reliable annotations for under-represented semantic classes. Altogether, it allows training competitive models at lower resolutions. Experiments show that the proposal outperforms other down-sampling strategies. Moreover, state-of-the-art performance is achieved for reference benchmarks, but employing significantly less computational resources than foremost approaches. This proposal enables competitive research for semantic segmentation under resource constraints.
△ Less
Submitted 19 February, 2024; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Aligned Diffusion Schrödinger Bridges
Authors:
Vignesh Ram Somnath,
Matteo Pariset,
Ya-Ping Hsieh,
Maria Rodriguez Martinez,
Andreas Krause,
Charlotte Bunne
Abstract:
Diffusion Schrödinger bridges (DSB) have recently emerged as a powerful framework for recovering stochastic dynamics via their marginal observations at different time points. Despite numerous successful applications, existing algorithms for solving DSBs have so far failed to utilize the structure of aligned data, which naturally arises in many biological phenomena. In this paper, we propose a nove…
▽ More
Diffusion Schrödinger bridges (DSB) have recently emerged as a powerful framework for recovering stochastic dynamics via their marginal observations at different time points. Despite numerous successful applications, existing algorithms for solving DSBs have so far failed to utilize the structure of aligned data, which naturally arises in many biological phenomena. In this paper, we propose a novel algorithmic framework that, for the first time, solves DSBs while respecting the data alignment. Our approach hinges on a combination of two decades-old ideas: The classical Schrödinger bridge theory and Doob's $h$-transform. Compared to prior methods, our approach leads to a simpler training procedure with lower variance, which we further augment with principled regularization schemes. This ultimately leads to sizeable improvements across experiments on synthetic and real data, including the tasks of predicting conformational changes in proteins and temporal evolution of cellular differentiation processes.
△ Less
Submitted 28 April, 2024; v1 submitted 22 February, 2023;
originally announced February 2023.
-
Automating Crochet Patterns for Surfaces of Revolution
Authors:
Megan Martinez,
Amanda Taylor Lipnicki
Abstract:
A surface of revolution is created by taking a curve in the $xy$-plane and rotating it about some axis. We develop a program which automatically generates crochet patterns for surfaces by revolution when they are obtained by rotating about the $x$-axis. In order to accomplish this, we invoke the arclength integral to determine where to take measurements for each row. In addition, a distance measur…
▽ More
A surface of revolution is created by taking a curve in the $xy$-plane and rotating it about some axis. We develop a program which automatically generates crochet patterns for surfaces by revolution when they are obtained by rotating about the $x$-axis. In order to accomplish this, we invoke the arclength integral to determine where to take measurements for each row. In addition, a distance measure is created to optimally space increases and decreases. The result is a program that will take a function, $x$-bounds, crochet gauge, and a scale in order to produce a polished crochet pattern.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
Energy Efficiency of Training Neural Network Architectures: An Empirical Study
Authors:
Yinlena Xu,
Silverio Martínez-Fernández,
Matias Martinez,
Xavier Franch
Abstract:
The evaluation of Deep Learning models has traditionally focused on criteria such as accuracy, F1 score, and related measures. The increasing availability of high computational power environments allows the creation of deeper and more complex models. However, the computations needed to train such models entail a large carbon footprint. In this work, we study the relations between DL model architec…
▽ More
The evaluation of Deep Learning models has traditionally focused on criteria such as accuracy, F1 score, and related measures. The increasing availability of high computational power environments allows the creation of deeper and more complex models. However, the computations needed to train such models entail a large carbon footprint. In this work, we study the relations between DL model architectures and their environmental impact in terms of energy consumed and CO$_2$ emissions produced during training by means of an empirical study using Deep Convolutional Neural Networks. Concretely, we study: (i) the impact of the architecture and the location where the computations are hosted on the energy consumption and emissions produced; (ii) the trade-off between accuracy and energy efficiency; and (iii) the difference on the method of measurement of the energy consumed using software-based and hardware-based tools.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
Graph Convolutional Network for Multi-Target Multi-Camera Vehicle Tracking
Authors:
Elena Luna,
Juan Carlos San Miguel,
José María Martínez,
Marcos Escudero-Viñolo
Abstract:
This letter focuses on the task of Multi-Target Multi-Camera vehicle tracking. We propose to associate single-camera trajectories into multi-camera global trajectories by training a Graph Convolutional Network. Our approach simultaneously processes all cameras providing a global solution, and it is also robust to large cameras unsynchronizations. Furthermore, we design a new loss function to deal…
▽ More
This letter focuses on the task of Multi-Target Multi-Camera vehicle tracking. We propose to associate single-camera trajectories into multi-camera global trajectories by training a Graph Convolutional Network. Our approach simultaneously processes all cameras providing a global solution, and it is also robust to large cameras unsynchronizations. Furthermore, we design a new loss function to deal with class imbalance. Our proposal outperforms the related work showing better generalization and without requiring ad-hoc manual annotations or thresholds, unlike compared approaches.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Energy Consumption of Automated Program Repair
Authors:
Matias Martinez,
Silverio Martínez-Fernández,
Xavier Franch
Abstract:
Automated program repair (APR) aims to automatize the process of repairing software bugs in order to reduce the cost of maintaining software programs. Moreover, the success (given by the accuracy metric) of APR approaches has increased in recent years. However, no previous work has considered the energy impact of repairing bugs automatically using APR. The field of green software research aims to…
▽ More
Automated program repair (APR) aims to automatize the process of repairing software bugs in order to reduce the cost of maintaining software programs. Moreover, the success (given by the accuracy metric) of APR approaches has increased in recent years. However, no previous work has considered the energy impact of repairing bugs automatically using APR. The field of green software research aims to measure the energy consumption required to develop, maintain, and use software products. This paper combines, for the first time, the APR and Green software research fields. We have as main goal to define the foundation for measuring the energy consumption of the APR activity. We measure the energy consumption of ten traditional program repair tools for Java and ten fine-tuned Large-Language Models (LLM) on source code trying to repair real bugs from Defects4J, a set of real buggy programs. The initial results from this experiment show the existing trade-off between energy consumption and the ability to correctly repair bugs: Some APR tools are capable of achieving higher accuracy by spending less energy than other tools.
△ Less
Submitted 5 February, 2024; v1 submitted 22 November, 2022;
originally announced November 2022.
-
Towards the Detection of Malicious Java Packages
Authors:
Piergiorgio Ladisa,
Henrik Plate,
Matias Martinez,
Olivier Barais,
Serena Elisa Ponta
Abstract:
Open-source software supply chain attacks aim at infecting downstream users by poisoning open-source packages. The common way of consuming such artifacts is through package repositories and the development of vetting strategies to detect such attacks is ongoing research. Despite its popularity, the Java ecosystem is the less explored one in the context of supply chain attacks.
In this paper we p…
▽ More
Open-source software supply chain attacks aim at infecting downstream users by poisoning open-source packages. The common way of consuming such artifacts is through package repositories and the development of vetting strategies to detect such attacks is ongoing research. Despite its popularity, the Java ecosystem is the less explored one in the context of supply chain attacks.
In this paper we present indicators of malicious behavior that can be observed statically through the analysis of Java bytecode. Then we evaluate how such indicators and their combinations perform when detecting malicious code injections. We do so by injecting three malicious payloads taken from real-world examples into the Top-10 most popular Java libraries from libraries.io.
We found that the analysis of strings in the constant pool and of sensitive APIs in the bytecode instructions aid in the task of detecting malicious Java packages by significantly reducing the information, thus, making also manual triage possible.
△ Less
Submitted 8 October, 2022;
originally announced October 2022.
-
A ResNet is All You Need? Modeling A Strong Baseline for Detecting Referable Diabetic Retinopathy in Fundus Images
Authors:
Tomás Castilla,
Marcela S. Martínez,
Mercedes Leguía,
Ignacio Larrabide,
José Ignacio Orlando
Abstract:
Deep learning is currently the state-of-the-art for automated detection of referable diabetic retinopathy (DR) from color fundus photographs (CFP). While the general interest is put on improving results through methodological innovations, it is not clear how good these approaches perform compared to standard deep classification models trained with the appropriate settings. In this paper we propose…
▽ More
Deep learning is currently the state-of-the-art for automated detection of referable diabetic retinopathy (DR) from color fundus photographs (CFP). While the general interest is put on improving results through methodological innovations, it is not clear how good these approaches perform compared to standard deep classification models trained with the appropriate settings. In this paper we propose to model a strong baseline for this task based on a simple and standard ResNet-18 architecture. To this end, we built on top of prior art by training the model with a standard preprocessing strategy but using images from several public sources and an empirically calibrated data augmentation setting. To evaluate its performance, we covered multiple clinically relevant perspectives, including image and patient level DR screening, discriminating responses by input quality and DR grade, assessing model uncertainties and analyzing its results in a qualitative manner. With no other methodological innovation than a carefully designed training, our ResNet model achieved an AUC = 0.955 (0.953 - 0.956) on a combined test set of 61007 test images from different public datasets, which is in line or even better than what other more complex deep learning models reported in the literature. Similar AUC values were obtained in 480 images from two separate in-house databases specially prepared for this study, which emphasize its generalization ability. This confirms that standard networks can still be strong baselines for this task if properly trained.
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
Dimensionality Reduction using Elastic Measures
Authors:
J. Derek Tucker,
Matthew T. Martinez,
Jose M. Laborde
Abstract:
With the recent surge in big data analytics for hyper-dimensional data there is a renewed interest in dimensionality reduction techniques for machine learning applications. In order for these methods to improve performance gains and understanding of the underlying data, a proper metric needs to be identified. This step is often overlooked and metrics are typically chosen without consideration of t…
▽ More
With the recent surge in big data analytics for hyper-dimensional data there is a renewed interest in dimensionality reduction techniques for machine learning applications. In order for these methods to improve performance gains and understanding of the underlying data, a proper metric needs to be identified. This step is often overlooked and metrics are typically chosen without consideration of the underlying geometry of the data. In this paper, we present a method for incorporating elastic metrics into the t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP). We apply our method to functional data, which is uniquely characterized by rotations, parameterization, and scale. If these properties are ignored, they can lead to incorrect analysis and poor classification performance. Through our method we demonstrate improved performance on shape identification tasks for three benchmark data sets (MPEG-7, Car data set, and Plane data set of Thankoor), where we achieve 0.77, 0.95, and 1.00 F1 score, respectively.
△ Less
Submitted 19 January, 2023; v1 submitted 7 September, 2022;
originally announced September 2022.