-
Small Encoders Can Rival Large Decoders in Detecting Groundedness
Authors:
Istabrak Abbes,
Gabriele Prato,
Quentin Fournier,
Fernando Rodriguez,
Alaa Boukhary,
Adam Elwood,
Sarath Chandar
Abstract:
Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensu…
▽ More
Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensuring factual consistency and trustworthiness. This study focuses on detecting whether a given query is grounded in a document provided in context before the costly answer generation by LLMs. Such a detection mechanism can significantly reduce both inference time and resource consumption. We show that lightweight, task specific encoder models such as RoBERTa and NomicBERT, fine-tuned on curated datasets, can achieve accuracy comparable to state-of-the-art LLMs, such as Llama3 8B and GPT4o, in groundedness detection while reducing inference latency by orders of magnitude. The code is available at : https://github.com/chandarlab/Hallucinate-less
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Colombian Waitresses y Jueces canadienses: Gender and Country Biases in Occupation Recommendations from LLMs
Authors:
Elisa Forcada Rodríguez,
Olatz Perez-de-Viñaspre,
Jon Ander Campos,
Dietrich Klakow,
Vagrant Gautam
Abstract:
One of the goals of fairness research in NLP is to measure and mitigate stereotypical biases that are propagated by NLP systems. However, such work tends to focus on single axes of bias (most often gender) and the English language. Addressing these limitations, we contribute the first study of multilingual intersecting country and gender biases, with a focus on occupation recommendations generated…
▽ More
One of the goals of fairness research in NLP is to measure and mitigate stereotypical biases that are propagated by NLP systems. However, such work tends to focus on single axes of bias (most often gender) and the English language. Addressing these limitations, we contribute the first study of multilingual intersecting country and gender biases, with a focus on occupation recommendations generated by large language models. We construct a benchmark of prompts in English, Spanish and German, where we systematically vary country and gender, using 25 countries and four pronoun sets. Then, we evaluate a suite of 5 Llama-based models on this benchmark, finding that LLMs encode significant gender and country biases. Notably, we find that even when models show parity for gender or country individually, intersectional occupational biases based on both country and gender persist. We also show that the prompting language significantly affects bias, and instruction-tuned models consistently demonstrate the lowest and most stable levels of bias. Our findings highlight the need for fairness researchers to use intersectional and multilingual lenses in their work.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
ChronoRoot 2.0: An Open AI-Powered Platform for 2D Temporal Plant Phenotyping
Authors:
Nicolás Gaggion,
Rodrigo Bonazzola,
María Florencia Legascue,
María Florencia Mammarella,
Florencia Sol Rodriguez,
Federico Emanuel Aballay,
Florencia Belén Catulo,
Andana Barrios,
Franco Accavallo,
Santiago Nahuel Villarreal,
Martin Crespi,
Martiniano María Ricardi,
Ezequiel Petrillo,
Thomas Blein,
Federico Ariel,
Enzo Ferrante
Abstract:
The analysis of plant developmental plasticity, including root system architecture, is fundamental to understanding plant adaptability and development, particularly in the context of climate change and agricultural sustainability. While significant advances have been made in plant phenotyping technologies, comprehensive temporal analysis of root development remains challenging, with most existing…
▽ More
The analysis of plant developmental plasticity, including root system architecture, is fundamental to understanding plant adaptability and development, particularly in the context of climate change and agricultural sustainability. While significant advances have been made in plant phenotyping technologies, comprehensive temporal analysis of root development remains challenging, with most existing solutions providing either limited throughput or restricted structural analysis capabilities. Here, we present ChronoRoot 2.0, an integrated open-source platform that combines affordable hardware with advanced artificial intelligence to enable sophisticated temporal plant phenotyping. The system introduces several major advances, offering an integral perspective of seedling development: (i) simultaneous multi-organ tracking of six distinct plant structures, (ii) quality control through real-time validation, (iii) comprehensive architectural measurements including novel gravitropic response parameters, and (iv) dual specialized user interfaces for both architectural analysis and high-throughput screening. We demonstrate the system's capabilities through three use cases for Arabidopsis thaliana: characterization of circadian growth patterns under different light conditions, detailed analysis of gravitropic responses in transgenic plants, and high-throughput screening of etiolation responses across multiple genotypes. ChronoRoot 2.0 maintains its predecessor's advantages of low cost and modularity while significantly expanding its capabilities, making sophisticated temporal phenotyping more accessible to the broader plant science community. The system's open-source nature, combined with extensive documentation and containerized deployment options, ensures reproducibility and enables community-driven development of new analytical capabilities.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
Filtro Adaptativo y Modulo de Grabacion en Dispositivo Para Mejora en la Calidad de Audicion
Authors:
Carlos Elihu Palomino Torres,
Francisco Claudio Chichipe Mondragon,
Frank Antonio Siesquen Rodriguez,
Mariana Alexandra Huaynate Leon
Abstract:
This project presents the development of a real-time auditory enhancement system utilizing an ESP32, an LMS adaptive filter, and artificial intelligence techniques. An I2S INMP44 microphone captures the sound, which is dynamically processed to suppress noise before being played through a MAX98357 speaker. The system continuously adapts to varying acoustic environments, ensuring improved speech cla…
▽ More
This project presents the development of a real-time auditory enhancement system utilizing an ESP32, an LMS adaptive filter, and artificial intelligence techniques. An I2S INMP44 microphone captures the sound, which is dynamically processed to suppress noise before being played through a MAX98357 speaker. The system continuously adapts to varying acoustic environments, ensuring improved speech clarity and an optimized listening experience
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Efficient variable-length hanging tether parameterization for marsupial robot planning in 3D environments
Authors:
S. Martínez-Rozas,
D. Alejo,
F. Caballero,
L. Merino,
M. A. Pérez-Cutiño,
F. Rodriguez,
V. Sánchez-Canales,
I. Ventura,
J. M. Díaz-Bañez
Abstract:
This paper presents a novel approach to efficiently parameterize and estimate the state of a hanging tether for path and trajectory planning of a UGV tied to a UAV in a marsupial configuration. Most implementations in the state of the art assume a taut tether or make use of the catenary curve to model the shape of the hanging tether. The catenary model is complex to compute and must be instantiate…
▽ More
This paper presents a novel approach to efficiently parameterize and estimate the state of a hanging tether for path and trajectory planning of a UGV tied to a UAV in a marsupial configuration. Most implementations in the state of the art assume a taut tether or make use of the catenary curve to model the shape of the hanging tether. The catenary model is complex to compute and must be instantiated thousands of times during the planning process, becoming a time-consuming task, while the taut tether assumption simplifies the problem, but might overly restrict the movement of the platforms. In order to accelerate the planning process, this paper proposes defining an analytical model to efficiently compute the hanging tether state, and a method to get a tether state parameterization free of collisions. We exploit the existing similarity between the catenary and parabola curves to derive analytical expressions of the tether state.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Algorithmic Clustering based on String Compression to Extract P300 Structure in EEG Signals
Authors:
Guillermo Sarasa,
Ana Granados,
Francisco B Rodríguez
Abstract:
P300 is an Event-Related Potential widely used in Brain-Computer Interfaces, but its detection is challenging due to inter-subject and temporal variability. This work introduces a clustering methodology based on Normalized Compression Distance (NCD) to extract the P300 structure, ensuring robustness against variability. We propose a novel signal-to-ASCII transformation to generate compression-frie…
▽ More
P300 is an Event-Related Potential widely used in Brain-Computer Interfaces, but its detection is challenging due to inter-subject and temporal variability. This work introduces a clustering methodology based on Normalized Compression Distance (NCD) to extract the P300 structure, ensuring robustness against variability. We propose a novel signal-to-ASCII transformation to generate compression-friendly objects, which are then clustered using a hierarchical tree-based method and a multidimensional projection approach. Experimental results on two datasets demonstrate the method's ability to reveal relevant P300 structures, showing clustering performance comparable to state-of-the-art approaches. Furthermore, analysis at the electrode level suggests that the method could assist in electrode selection for P300 detection. This compression-driven clustering methodology offers a complementary tool for EEG analysis and P300 identification.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Discovering Dataset Nature through Algorithmic Clustering based on String Compression
Authors:
Ana Granados,
Kostadin Koroutchev,
Francisco de Borja Rodríguez
Abstract:
Text datasets can be represented using models that do not preserve text structure, or using models that preserve text structure. Our hypothesis is that depending on the dataset nature, there can be advantages using a model that preserves text structure over one that does not, and viceversa. The key is to determine the best way of representing a particular dataset, based on the dataset itself. In t…
▽ More
Text datasets can be represented using models that do not preserve text structure, or using models that preserve text structure. Our hypothesis is that depending on the dataset nature, there can be advantages using a model that preserves text structure over one that does not, and viceversa. The key is to determine the best way of representing a particular dataset, based on the dataset itself. In this work, we propose to investigate this problem by combining text distortion and algorithmic clustering based on string compression. Specifically, a distortion technique previously developed by the authors is applied to destroy text structure progressively. Following this, a clustering algorithm based on string compression is used to analyze the effects of the distortion on the information contained in the texts. Several experiments are carried out on text datasets and artificially-generated datasets. The results show that in strongly structural datasets the clustering results worsen as text structure is progressively destroyed. Besides, they show that using a compressor which enables the choice of the size of the left-context symbols helps to determine the nature of the datasets. Finally, the results are contrasted with a method based on multidimensional projections and analogous conclusions are obtained.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts
Authors:
Arturo Rodriguez,
Ashesh Chattopadhyay,
Piyush Kumar,
Luis F. Rodriguez,
Vinod Kumar
Abstract:
Physics-informed neural networks (PINNs) commonly address ill-posed inverse problems by uncovering unknown physics. This study presents a novel unsupervised learning framework that identifies spatial subdomains with specific governing physics. It uses the partition of unity networks (POUs) to divide the space into subdomains, assigning unique nonlinear model parameters to each, which are integrate…
▽ More
Physics-informed neural networks (PINNs) commonly address ill-posed inverse problems by uncovering unknown physics. This study presents a novel unsupervised learning framework that identifies spatial subdomains with specific governing physics. It uses the partition of unity networks (POUs) to divide the space into subdomains, assigning unique nonlinear model parameters to each, which are integrated into the physics model. A vital feature of this method is a physics residual-based loss function that detects variations in physical properties without requiring labeled data. This approach enables the discovery of spatial decompositions and nonlinear parameters in partial differential equations (PDEs), optimizing the solution space by dividing it into subdomains and improving accuracy. Its effectiveness is demonstrated through applications in porous media thermal ablation and ice-sheet modeling, showcasing its potential for tackling real-world physics challenges.
△ Less
Submitted 7 December, 2024;
originally announced December 2024.
-
RTify: Aligning Deep Neural Networks with Human Behavioral Decisions
Authors:
Yu-Ang Cheng,
Ivan Felipe Rodriguez,
Sixuan Chen,
Kohitij Kar,
Takeo Watanabe,
Thomas Serre
Abstract:
Current neural network models of primate vision focus on replicating overall levels of behavioral accuracy, often neglecting perceptual decisions' rich, dynamic nature. Here, we introduce a novel computational framework to model the dynamics of human behavioral choices by learning to align the temporal dynamics of a recurrent neural network (RNN) to human reaction times (RTs). We describe an appro…
▽ More
Current neural network models of primate vision focus on replicating overall levels of behavioral accuracy, often neglecting perceptual decisions' rich, dynamic nature. Here, we introduce a novel computational framework to model the dynamics of human behavioral choices by learning to align the temporal dynamics of a recurrent neural network (RNN) to human reaction times (RTs). We describe an approximation that allows us to constrain the number of time steps an RNN takes to solve a task with human RTs. The approach is extensively evaluated against various psychophysics experiments. We also show that the approximation can be used to optimize an "ideal-observer" RNN model to achieve an optimal tradeoff between speed and accuracy without human data. The resulting model is found to account well for human RT data. Finally, we use the approximation to train a deep learning implementation of the popular Wong-Wang decision-making model. The model is integrated with a convolutional neural network (CNN) model of visual processing and evaluated using both artificial and natural image stimuli. Overall, we present a novel framework that helps align current vision models with human behavior, bringing us closer to an integrated model of human vision.
△ Less
Submitted 26 December, 2024; v1 submitted 5 November, 2024;
originally announced November 2024.
-
Advancing Free-Space Optical Communication System Architecture: Performance Analysis of Varied Optical Ground Station Network Configurations
Authors:
Eugene Rotherham,
Connor Casey,
Eva Fernandez Rodriguez,
Karen Wendy Vidaurre Torrez,
Maren Mashor,
Isaac Pike
Abstract:
This study discusses the current state of FSO technology, as well as global trends and developments in the industrial ecosystem to identify obstacles to the full realization of optical space-to-ground communication networks. Additionally, link performance and network availability trade-off studies are presented, comparing overall system performance between portable and large OGS networks in conjun…
▽ More
This study discusses the current state of FSO technology, as well as global trends and developments in the industrial ecosystem to identify obstacles to the full realization of optical space-to-ground communication networks. Additionally, link performance and network availability trade-off studies are presented, comparing overall system performance between portable and large OGS networks in conjunction with a constellation of small low Earth orbit (LEO) satellites. The paper provides an up-to-date overview and critical analysis of the FSO industry and assesses the feasibility of low-cost portable terminals as an alternative to larger high-capacity OGS systems. This initiative aims to better inform optical communications stakeholders, including governments, academic institutions, satellite operators, manufacturers, and communication service providers
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
Impact of Usability Mechanisms: A Family of Experiments on Efficiency, Effectiveness and User Satisfaction
Authors:
Juan M. Ferreira,
Francy Rodríguez,
Adrián Santos,
Silvia T. Acuña,
Natalia Juristo
Abstract:
Context: The usability software quality attribute aims to improve system user performance. In a previous study, we found evidence of the impact of a set of usability characteristics from the viewpoint of users in terms of efficiency, effectiveness and satisfaction. However, the impact level appears to depend on the usability feature and suggest priorities with respect to their implementation depen…
▽ More
Context: The usability software quality attribute aims to improve system user performance. In a previous study, we found evidence of the impact of a set of usability characteristics from the viewpoint of users in terms of efficiency, effectiveness and satisfaction. However, the impact level appears to depend on the usability feature and suggest priorities with respect to their implementation depending on how they promote user performance. Objectives: We use a family of three experiments to increase the precision and generalization of the results in the baseline experiment and provide findings on the impact on user performance of the Abort Operation, Progress Feedback and Preferences usability mechanisms. Method: We conduct two replications of the baseline experiment in academic settings. We analyse the data of 367 experimental subjects and apply aggregation (meta-analysis) procedures. Results: We find that the Abort Operation and Preferences usability mechanisms appear to improve system usability a great deal with respect to efficiency, effectiveness and user satisfaction. Conclusions: We find that the family of experiments further corroborates the results of the baseline experiment. Most of the results are statistically significant, and, because of the large number of experimental subjects, the evidence that we gathered in the replications is sufficient to outweigh other experiments.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
MASPA: An efficient strategy for path planning with a tethered marsupial robotics system
Authors:
Jesús Capitán,
José M. Díaz-Báñez,
Miguel A. Pérez-Cutiño,
Fabio Rodríguez,
Inmaculada Ventura
Abstract:
A tethered marsupial robotics system comprises three components: an Unmanned Ground Vehicle (UGV), an Unmanned Aerial Vehicle (UAV), and a tether connecting both robots. Marsupial systems are highly beneficial in industry as they extend the UAV's battery life during flight. This paper introduces a novel strategy for a specific path planning problem in marsupial systems, where each of the three com…
▽ More
A tethered marsupial robotics system comprises three components: an Unmanned Ground Vehicle (UGV), an Unmanned Aerial Vehicle (UAV), and a tether connecting both robots. Marsupial systems are highly beneficial in industry as they extend the UAV's battery life during flight. This paper introduces a novel strategy for a specific path planning problem in marsupial systems, where each of the three components must avoid collisions with ground and aerial obstacles modeled as 3D cuboids. Given an initial configuration in which the UAV is positioned atop the UGV, the goal is to reach an aerial target with the UAV. We assume that the UGV first moves to a position from which the UAV can take off and fly through a vertical plane to reach an aerial target. We propose an approach that discretizes the space to approximate an optimal solution, minimizing the sum of the lengths of the ground and air paths. First, we assume a taut tether and use a novel algorithm that leverages the convexity of the tether and the geometry of obstacles to efficiently determine the locus of feasible take-off points for the UAV. We then apply this result to scenarios that involve loose tethers. The simulation test results show that our approach can solve complex situations in seconds, outperforming a baseline planning algorithm based on RRT* (Rapidly exploring Random Trees).
△ Less
Submitted 23 December, 2024; v1 submitted 4 August, 2024;
originally announced August 2024.
-
BraTS-PEDs: Results of the Multi-Consortium International Pediatric Brain Tumor Segmentation Challenge 2023
Authors:
Anahita Fathi Kazerooni,
Nastaran Khalili,
Xinyang Liu,
Debanjan Haldar,
Zhifan Jiang,
Anna Zapaishchykova,
Julija Pavaine,
Lubdha M. Shah,
Blaise V. Jones,
Nakul Sheth,
Sanjay P. Prabhu,
Aaron S. McAllister,
Wenxin Tu,
Khanak K. Nandolia,
Andres F. Rodriguez,
Ibraheem Salman Shaikh,
Mariana Sanchez Montano,
Hollie Anne Lai,
Maruf Adewole,
Jake Albrecht,
Udunna Anazodo,
Hannah Anderson,
Syed Muhammed Anwar,
Alejandro Aristizabal,
Sina Bagheri
, et al. (55 additional authors not shown)
Abstract:
Pediatric central nervous system tumors are the leading cause of cancer-related deaths in children. The five-year survival rate for high-grade glioma in children is less than 20%. The development of new treatments is dependent upon multi-institutional collaborative clinical trials requiring reproducible and accurate centralized response assessment. We present the results of the BraTS-PEDs 2023 cha…
▽ More
Pediatric central nervous system tumors are the leading cause of cancer-related deaths in children. The five-year survival rate for high-grade glioma in children is less than 20%. The development of new treatments is dependent upon multi-institutional collaborative clinical trials requiring reproducible and accurate centralized response assessment. We present the results of the BraTS-PEDs 2023 challenge, the first Brain Tumor Segmentation (BraTS) challenge focused on pediatric brain tumors. This challenge utilized data acquired from multiple international consortia dedicated to pediatric neuro-oncology and clinical trials. BraTS-PEDs 2023 aimed to evaluate volumetric segmentation algorithms for pediatric brain gliomas from magnetic resonance imaging using standardized quantitative performance evaluation metrics employed across the BraTS 2023 challenges. The top-performing AI approaches for pediatric tumor analysis included ensembles of nnU-Net and Swin UNETR, Auto3DSeg, or nnU-Net with a self-supervised framework. The BraTSPEDs 2023 challenge fostered collaboration between clinicians (neuro-oncologists, neuroradiologists) and AI/imaging scientists, promoting faster data sharing and the development of automated volumetric analysis techniques. These advancements could significantly benefit clinical trials and improve the care of children with brain tumors.
△ Less
Submitted 16 July, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Towards a Robotic Intrusion Prevention System: Combining Security and Safety in Cognitive Social Robots
Authors:
Francisco Martín,
Enrique Soriano-Salvador,
José Miguel Guerrero,
Gorka Guardiola Múzquiz,
Juan Carlos Manzanares,
Francisco J. Rodríguez
Abstract:
Social Robots need to be safe and reliable to share their space with humans. This paper reports on the first results of a research project that aims to create more safe and reliable, intelligent autonomous robots by investigating the implications and interactions between cybersecurity and safety. We propose creating a robotic intrusion prevention system (RIPS) that follows a novel approach to dete…
▽ More
Social Robots need to be safe and reliable to share their space with humans. This paper reports on the first results of a research project that aims to create more safe and reliable, intelligent autonomous robots by investigating the implications and interactions between cybersecurity and safety. We propose creating a robotic intrusion prevention system (RIPS) that follows a novel approach to detect and mitigate intrusions in cognitive social robot systems and other cyber-physical systems. The RIPS detects threats at the robotic communication level and enables mitigation of the cyber-physical threats by using System Modes to define what part of the robotic system reduces or limits its functionality while the system is compromised. We demonstrate the validity of our approach by applying it to a cognitive architecture running in a real social robot that preserves the privacy and safety of humans while facing several cyber attack situations.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMs
Authors:
Edison Jair Bejarano Sepulveda,
Nicolai Potes Hector,
Santiago Pineda Montoya,
Felipe Ivan Rodriguez,
Jaime Enrique Orduy,
Alec Rosales Cabezas,
Danny Traslaviña Navarrete,
Sergio Madrid Farfan
Abstract:
This paper explores the potential of large language models (LLMs) to make the Aeronautical Regulations of Colombia (RAC) more accessible. Given the complexity and extensive technicality of the RAC, this study introduces a novel approach to simplifying these regulations for broader understanding. By developing the first-ever RAC database, which contains 24,478 expertly labeled question-and-answer p…
▽ More
This paper explores the potential of large language models (LLMs) to make the Aeronautical Regulations of Colombia (RAC) more accessible. Given the complexity and extensive technicality of the RAC, this study introduces a novel approach to simplifying these regulations for broader understanding. By developing the first-ever RAC database, which contains 24,478 expertly labeled question-and-answer pairs, and fine-tuning LLMs specifically for RAC applications, the paper outlines the methodology for dataset assembly, expert-led annotation, and model training. Utilizing the Gemma1.1 2b model along with advanced techniques like Unsloth for efficient VRAM usage and flash attention mechanisms, the research aims to expedite training processes. This initiative establishes a foundation to enhance the comprehensibility and accessibility of RAC, potentially benefiting novices and reducing dependence on expert consultations for navigating the aviation industry's regulatory landscape.
You can visit the dataset (https://huggingface.co/somosnlp/gemma-1.1-2b-it_ColombiaRAC_FullyCurated_format_chatML_V1) and the model (https://huggingface.co/datasets/somosnlp/ColombiaRAC_FullyCurated) here.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Detection of the most influential variables for preventing postpartum urinary incontinence using machine learning techniques
Authors:
José Alberto Benítez-Andrades,
María Teresa García-Ordás,
María Álvarez-González,
Raquel Leirós-Rodríguez,
Ana F López Rodríguez
Abstract:
Background: Postpartum urinary incontinence (PUI) is a common issue among postnatal women. Previous studies identified potential related variables, but lacked analysis on certain intrinsic and extrinsic patient variables during pregnancy.
Objective: The study aims to evaluate the most influential variables in PUI using machine learning, focusing on intrinsic, extrinsic, and combined variable gro…
▽ More
Background: Postpartum urinary incontinence (PUI) is a common issue among postnatal women. Previous studies identified potential related variables, but lacked analysis on certain intrinsic and extrinsic patient variables during pregnancy.
Objective: The study aims to evaluate the most influential variables in PUI using machine learning, focusing on intrinsic, extrinsic, and combined variable groups.
Methods: Data from 93 pregnant women were analyzed using machine learning and oversampling techniques. Four key variables were predicted: occurrence, frequency, intensity of urinary incontinence, and stress urinary incontinence.
Results: Models using extrinsic variables were most accurate, with 70% accuracy for urinary incontinence, 77% for frequency, 71% for intensity, and 93% for stress urinary incontinence.
Conclusions: The study highlights extrinsic variables as significant predictors of PUI issues. This suggests that PUI prevention might be achievable through healthy habits during pregnancy, although further research is needed for confirmation.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
The GREENBOT dataset: Multimodal mobile robotic dataset for a typical Mediterranean greenhouse
Authors:
Fernando Cañadas-Aránega,
Jose Luis Blanco-Claraco,
Jose Carlos Moreno,
Francisco Rodriguez
Abstract:
This paper introduces an innovative dataset specifically crafted for challenging agricultural settings (a greenhouse), where achieving precise localization is of paramount importance. The dataset was gathered using a mobile platform equipped with a set of sensors typically used in mobile robots, as it was moved through all the corridors of a typical Mediterranean greenhouse featuring tomato crop.…
▽ More
This paper introduces an innovative dataset specifically crafted for challenging agricultural settings (a greenhouse), where achieving precise localization is of paramount importance. The dataset was gathered using a mobile platform equipped with a set of sensors typically used in mobile robots, as it was moved through all the corridors of a typical Mediterranean greenhouse featuring tomato crop. This dataset presents a unique opportunity for constructing detailed 3D models of plants in such indoor-like space, with potential applications such as robotized spraying. For the first time to the best knowledge of authors, a dataset suitable to put at test Simultaneous Localization and Mapping (SLAM) methods is presented in a greenhouse environment, which poses unique challenges. The suitability of the dataset for such goal is assessed by presenting SLAM results with state-of-the-art algorithms. The dataset is available online in \url{https://arm.ual.es/arm-group/dataset-greenhouse-2024/}.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Benchmarking Particle Filter Algorithms for Efficient Velodyne-Based Vehicle Localization
Authors:
Jose Luis Blanco-Claraco,
Francisco Mañas-Alvarez,
Jose Luis Torres-Moreno,
Francisco Rodriguez,
Antonio Gimenez-Fernandez
Abstract:
Keeping a vehicle well-localized within a prebuilt-map is at the core of any autonomous vehicle navigation system. In this work, we show that both standard SIR sampling and rejection-based optimal sampling are suitable for efficient (10 to 20 ms) real-time pose tracking without feature detection that is using raw point clouds from a 3D LiDAR. Motivated by the large amount of information captured b…
▽ More
Keeping a vehicle well-localized within a prebuilt-map is at the core of any autonomous vehicle navigation system. In this work, we show that both standard SIR sampling and rejection-based optimal sampling are suitable for efficient (10 to 20 ms) real-time pose tracking without feature detection that is using raw point clouds from a 3D LiDAR. Motivated by the large amount of information captured by these sensors, we perform a systematic statistical analysis of how many points are actually required to reach an optimal ratio between efficiency and positioning accuracy. Furthermore, initialization from adverse conditions, e.g., poor GPS signal in urban canyons, we also identify the optimal particle filter settings required to ensure convergence. Our findings include that a decimation factor between 100 and 200 on incoming point clouds provides a large savings in computational cost with a negligible loss in localization accuracy for a VLP-16 scanner. Furthermore, an initial density of $\sim$2 particles/m$^2$ is required to achieve 100% convergence success for large-scale ($\sim$100,000 m$^2$), outdoor global localization without any additional hint from GPS or magnetic field sensors. All implementations have been released as open-source software.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Hallucination-minimized Data-to-answer Framework for Financial Decision-makers
Authors:
Sohini Roychowdhury,
Andres Alvarez,
Brian Moore,
Marko Krema,
Maria Paz Gelpi,
Federico Martin Rodriguez,
Angel Rodriguez,
Jose Ramon Cabrejas,
Pablo Martinez Serrano,
Punit Agrawal,
Arijit Mukherjee
Abstract:
Large Language Models (LLMs) have been applied to build several automation and personalized question-answering prototypes so far. However, scaling such prototypes to robust products with minimized hallucinations or fake responses still remains an open challenge, especially in niche data-table heavy domains such as financial decision making. In this work, we present a novel Langchain-based framewor…
▽ More
Large Language Models (LLMs) have been applied to build several automation and personalized question-answering prototypes so far. However, scaling such prototypes to robust products with minimized hallucinations or fake responses still remains an open challenge, especially in niche data-table heavy domains such as financial decision making. In this work, we present a novel Langchain-based framework that transforms data tables into hierarchical textual data chunks to enable a wide variety of actionable question answering. First, the user-queries are classified by intention followed by automated retrieval of the most relevant data chunks to generate customized LLM prompts per query. Next, the custom prompts and their responses undergo multi-metric scoring to assess for hallucinations and response confidence. The proposed system is optimized with user-query intention classification, advanced prompting, data scaling capabilities and it achieves over 90% confidence scores for a variety of user-queries responses ranging from {What, Where, Why, How, predict, trend, anomalies, exceptions} that are crucial for financial decision making applications. The proposed data to answers framework can be extended to other analytical domains such as sales and payroll to ensure optimal hallucination control guardrails.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
MolecularWebXR: Multiuser discussions about chemistry and biology in immersive and inclusive VR
Authors:
Fabio J. Cortes Rodriguez,
Gianfranco Frattini,
Fernando Teixeira Pinto Meireles,
Danae A. Terrien,
Sergio Cruz-Leon,
Matteo Dal Peraro,
Eva Schier,
Diego M. Moreno,
Luciano A. Abriata
Abstract:
MolecularWebXR is our new website for education, science communication and scientific peer discussion in chemistry and biology built on WebXR. It democratizes multi-user, inclusive virtual reality (VR) experiences that are deeply immersive for users wearing high-end headsets, yet allow participation by users with consumer devices such as smartphones, possibly inserted into cardboard goggles for im…
▽ More
MolecularWebXR is our new website for education, science communication and scientific peer discussion in chemistry and biology built on WebXR. It democratizes multi-user, inclusive virtual reality (VR) experiences that are deeply immersive for users wearing high-end headsets, yet allow participation by users with consumer devices such as smartphones, possibly inserted into cardboard goggles for immersivity, or even computers or tablets. With no installs as it is all web-served, MolecularWebXR enables multiple users to simultaneously explore, communicate and discuss chemistry and biology concepts in immersive 3D environments, manipulating objects with their bare hands, either present in the same real space or scattered throughout the globe thanks to built-in audio features. A series of preset rooms cover educational material on chemistry and structural biology, and an empty room can be populated with material prepared ad hoc using moleculARweb's VMD-based PDB2AR tool. We verified ease of use and versatility by users aged 12-80 in entirely virtual sessions or mixed real-virtual sessions at science outreach events, student instruction, scientific collaborations, and conference lectures. MolecularWebXR is available for free use without registration at https://molecularwebxr.org, and a blog post version of this preprint with embedded videos is available at https://go.epfl.ch/molecularwebxr-blog-post.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Designing a Hybrid Neural System to Learn Real-world Crack Segmentation from Fractal-based Simulation
Authors:
Achref Jaziri,
Martin Mundt,
Andres Fernandez Rodriguez,
Visvanathan Ramesh
Abstract:
Identification of cracks is essential to assess the structural integrity of concrete infrastructure. However, robust crack segmentation remains a challenging task for computer vision systems due to the diverse appearance of concrete surfaces, variable lighting and weather conditions, and the overlapping of different defects. In particular recent data-driven methods struggle with the limited availa…
▽ More
Identification of cracks is essential to assess the structural integrity of concrete infrastructure. However, robust crack segmentation remains a challenging task for computer vision systems due to the diverse appearance of concrete surfaces, variable lighting and weather conditions, and the overlapping of different defects. In particular recent data-driven methods struggle with the limited availability of data, the fine-grained and time-consuming nature of crack annotation, and face subsequent difficulty in generalizing to out-of-distribution samples. In this work, we move past these challenges in a two-fold way. We introduce a high-fidelity crack graphics simulator based on fractals and a corresponding fully-annotated crack dataset. We then complement the latter with a system that learns generalizable representations from simulation, by leveraging both a pointwise mutual information estimate along with adaptive instance normalization as inductive biases. Finally, we empirically highlight how different design choices are symbiotic in bridging the simulation to real gap, and ultimately demonstrate that our introduced system can effectively handle real-world crack segmentation.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Directed Scattering for Knowledge Graph-based Cellular Signaling Analysis
Authors:
Aarthi Venkat,
Joyce Chew,
Ferran Cardoso Rodriguez,
Christopher J. Tape,
Michael Perlmutter,
Smita Krishnaswamy
Abstract:
Directed graphs are a natural model for many phenomena, in particular scientific knowledge graphs such as molecular interaction or chemical reaction networks that define cellular signaling relationships. In these situations, source nodes typically have distinct biophysical properties from sinks. Due to their ordered and unidirectional relationships, many such networks also have hierarchical and mu…
▽ More
Directed graphs are a natural model for many phenomena, in particular scientific knowledge graphs such as molecular interaction or chemical reaction networks that define cellular signaling relationships. In these situations, source nodes typically have distinct biophysical properties from sinks. Due to their ordered and unidirectional relationships, many such networks also have hierarchical and multiscale structure. However, the majority of methods performing node- and edge-level tasks in machine learning do not take these properties into account, and thus have not been leveraged effectively for scientific tasks such as cellular signaling network inference. We propose a new framework called Directed Scattering Autoencoder (DSAE) which uses a directed version of a geometric scattering transform, combined with the non-linear dimensionality reduction properties of an autoencoder and the geometric properties of the hyperbolic space to learn latent hierarchies. We show this method outperforms numerous others on tasks such as embedding directed graphs and learning cellular signaling networks.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Performance-optimized deep neural networks are evolving into worse models of inferotemporal visual cortex
Authors:
Drew Linsley,
Ivan F. Rodriguez,
Thomas Fel,
Michael Arcaro,
Saloni Sharma,
Margaret Livingstone,
Thomas Serre
Abstract:
One of the most impactful findings in computational neuroscience over the past decade is that the object recognition accuracy of deep neural networks (DNNs) correlates with their ability to predict neural responses to natural images in the inferotemporal (IT) cortex. This discovery supported the long-held theory that object recognition is a core objective of the visual cortex, and suggested that m…
▽ More
One of the most impactful findings in computational neuroscience over the past decade is that the object recognition accuracy of deep neural networks (DNNs) correlates with their ability to predict neural responses to natural images in the inferotemporal (IT) cortex. This discovery supported the long-held theory that object recognition is a core objective of the visual cortex, and suggested that more accurate DNNs would serve as better models of IT neuron responses to images. Since then, deep learning has undergone a revolution of scale: billion parameter-scale DNNs trained on billions of images are rivaling or outperforming humans at visual tasks including object recognition. Have today's DNNs become more accurate at predicting IT neuron responses to images as they have grown more accurate at object recognition?
Surprisingly, across three independent experiments, we find this is not the case. DNNs have become progressively worse models of IT as their accuracy has increased on ImageNet. To understand why DNNs experience this trade-off and evaluate if they are still an appropriate paradigm for modeling the visual system, we turn to recordings of IT that capture spatially resolved maps of neuronal activity elicited by natural images. These neuronal activity maps reveal that DNNs trained on ImageNet learn to rely on different visual features than those encoded by IT and that this problem worsens as their accuracy increases. We successfully resolved this issue with the neural harmonizer, a plug-and-play training routine for DNNs that aligns their learned representations with humans. Our results suggest that harmonized DNNs break the trade-off between ImageNet accuracy and neural prediction accuracy that assails current DNNs and offer a path to more accurate models of biological vision.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Optimal Placement of Base Stations in Border Surveillance using Limited Capacity Drones
Authors:
S. Bereg,
J. M. Díaz-Báñez,
M. Haghpanah,
P. Horn,
M. A. Lopez,
N. Marín,
A. Ramírez-Vigueras,
F. Rodríguez,
O. Solé-Pi,
A. Stevens,
J. Urrutia
Abstract:
Imagine an island modeled as a simple polygon $¶$ with $n$ vertices whose coastline we wish to monitor. We consider the problem of building the minimum number of refueling stations along the boundary of $¶$ in such a way that a drone can follow a polygonal route enclosing the island without running out of fuel. A drone can fly a maximum distance $d$ between consecutive stations and is restricted t…
▽ More
Imagine an island modeled as a simple polygon $¶$ with $n$ vertices whose coastline we wish to monitor. We consider the problem of building the minimum number of refueling stations along the boundary of $¶$ in such a way that a drone can follow a polygonal route enclosing the island without running out of fuel. A drone can fly a maximum distance $d$ between consecutive stations and is restricted to move either along the boundary of $¶$ or its exterior (i.e., over water). We present an algorithm that, given $\mathcal P$, finds the locations for a set of refueling stations whose cardinality is at most the optimal plus one. The time complexity of this algorithm is $O(n^2 + \frac{L}{d} n)$, where $L$ is the length of $\mathcal P$. We also present an algorithm that returns an additive $ε$-approximation for the problem of minimizing the fuel capacity required for the drones when we are allowed to place $k$ base stations around the boundary of the island; this algorithm also finds the locations of these refueling stations. Finally, we propose a practical discretization heuristic which, under certain conditions, can be used to certify optimality of the results.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Scaling and compressing melodies using geometric similarity measures
Authors:
Luis Evaristo Caraballo,
José Miguel Díaz-Báñez,
Fabio Rodríguez,
Vanesa Sánchez-Canales,
Inmaculada Ventura
Abstract:
Melodic similarity measurement is of key importance in music information retrieval. In this paper, we use geometric matching techniques to measure the similarity between two melodies. We represent music as sets of points or sets of horizontal line segments in the Euclidean plane and propose efficient algorithms for optimization problems inspired in two operations on melodies; linear scaling and au…
▽ More
Melodic similarity measurement is of key importance in music information retrieval. In this paper, we use geometric matching techniques to measure the similarity between two melodies. We represent music as sets of points or sets of horizontal line segments in the Euclidean plane and propose efficient algorithms for optimization problems inspired in two operations on melodies; linear scaling and audio compression. In the scaling problem, an incoming query melody is scaled forward until the similarity measure between the query and a reference melody is minimized. The compression problem asks for a subset of notes of a given melody such that the matching cost between the selected notes and the reference melody is minimized.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Portable Multi-Hypothesis Monte Carlo Localization for Mobile Robots
Authors:
Alberto Garcia,
Francisco Martin,
Jose Miguel Guerrero,
Francisco J. Rodriguez,
Vicente Matellan
Abstract:
Self-localization is a fundamental capability that mobile robot navigation systems integrate to move from one point to another using a map. Thus, any enhancement in localization accuracy is crucial to perform delicate dexterity tasks. This paper describes a new location that maintains several populations of particles using the Monte Carlo Localization (MCL) algorithm, always choosing the best one…
▽ More
Self-localization is a fundamental capability that mobile robot navigation systems integrate to move from one point to another using a map. Thus, any enhancement in localization accuracy is crucial to perform delicate dexterity tasks. This paper describes a new location that maintains several populations of particles using the Monte Carlo Localization (MCL) algorithm, always choosing the best one as the sytems's output. As novelties, our work includes a multi-scale match matching algorithm to create new MCL populations and a metric to determine the most reliable. It also contributes the state-of-the-art implementations, enhancing recovery times from erroneous estimates or unknown initial positions. The proposed method is evaluated in ROS2 in a module fully integrated with Nav2 and compared with the current state-of-the-art Adaptive ACML solution, obtaining good accuracy and recovery times.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Conviformers: Convolutionally guided Vision Transformer
Authors:
Mohit Vaishnav,
Thomas Fel,
Ivań Felipe Rodríguez,
Thomas Serre
Abstract:
Vision transformers are nowadays the de-facto choice for image classification tasks. There are two broad categories of classification tasks, fine-grained and coarse-grained. In fine-grained classification, the necessity is to discover subtle differences due to the high level of similarity between sub-classes. Such distinctions are often lost as we downscale the image to save the memory and computa…
▽ More
Vision transformers are nowadays the de-facto choice for image classification tasks. There are two broad categories of classification tasks, fine-grained and coarse-grained. In fine-grained classification, the necessity is to discover subtle differences due to the high level of similarity between sub-classes. Such distinctions are often lost as we downscale the image to save the memory and computational cost associated with vision transformers (ViT). In this work, we present an in-depth analysis and describe the critical components for developing a system for the fine-grained categorization of plants from herbarium sheets. Our extensive experimental analysis indicated the need for a better augmentation technique and the ability of modern-day neural networks to handle higher dimensional images. We also introduce a convolutional transformer architecture called Conviformer which, unlike the popular Vision Transformer (ConViT), can handle higher resolution images without exploding memory and computational cost. We also introduce a novel, improved pre-processing technique called PreSizer to resize images better while preserving their original aspect ratios, which proved essential for classifying natural plants. With our simple yet effective approach, we achieved SoTA on Herbarium 202x and iNaturalist 2019 dataset.
△ Less
Submitted 28 August, 2022; v1 submitted 17 August, 2022;
originally announced August 2022.
-
A deep learning model for classification of diabetic retinopathy in eye fundus images based on retinal lesion detection
Authors:
Melissa delaPava,
Hernán Ríos,
Francisco J. Rodríguez,
Oscar J. Perdomo,
Fabio A. González
Abstract:
Diabetic retinopathy (DR) is the result of a complication of diabetes affecting the retina. It can cause blindness, if left undiagnosed and untreated. An ophthalmologist performs the diagnosis by screening each patient and analyzing the retinal lesions via ocular imaging. In practice, such analysis is time-consuming and cumbersome to perform. This paper presents a model for automatic DR classifica…
▽ More
Diabetic retinopathy (DR) is the result of a complication of diabetes affecting the retina. It can cause blindness, if left undiagnosed and untreated. An ophthalmologist performs the diagnosis by screening each patient and analyzing the retinal lesions via ocular imaging. In practice, such analysis is time-consuming and cumbersome to perform. This paper presents a model for automatic DR classification on eye fundus images. The approach identifies the main ocular lesions related to DR and subsequently diagnoses the illness. The proposed method follows the same workflow as the clinicians, providing information that can be interpreted clinically to support the prediction. A subset of the kaggle EyePACS and the Messidor-2 datasets, labeled with ocular lesions, is made publicly available. The kaggle EyePACS subset is used as a training set and the Messidor-2 as a test set for lesions and DR classification models. For DR diagnosis, our model has an area-under-the-curve, sensitivity, and specificity of 0.948, 0.886, and 0.875, respectively, which competes with state-of-the-art approaches.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
Detecting Oxbow Code in Erlang Codebases with the Highest Degree of Certainty
Authors:
Fernando Benavides Rodríguez,
Laura M. Castro
Abstract:
The presence of source code that is no longer needed is a handicap to project maintainability. The larger and longer-lived the project, the higher the chances of accumulating dead code in its different forms.
Manually detecting unused code is time-consuming, tedious, error-prone, and requires a great level of deep knowledge about the codebase. In this paper, we examine the kinds of dead code (sp…
▽ More
The presence of source code that is no longer needed is a handicap to project maintainability. The larger and longer-lived the project, the higher the chances of accumulating dead code in its different forms.
Manually detecting unused code is time-consuming, tedious, error-prone, and requires a great level of deep knowledge about the codebase. In this paper, we examine the kinds of dead code (specifically, oxbow code) that can appear in Erlang projects, and formulate rules to identify them with high accuracy.
We also present an open-source static analyzer that implements these rules, allowing for the automatic detection and confident removal of oxbow code in Erlang codebases, actively contributing to increasing their quality and maintainability.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
PlanSys2: A Planning System Framework for ROS2
Authors:
Francisco Martín,
Jonatan Ginés,
Vicente Matellán,
Francisco J. Rodríguez
Abstract:
Autonomous robots need to plan the tasks they carry out to fulfill their missions. The missions' increasing complexity does not let human designers anticipate all the possible situations, so traditional control systems based on state machines are not enough. This paper contains a description of the ROS2 Planning System (PlanSys2 in short), a framework for symbolic planning that incorporates novel…
▽ More
Autonomous robots need to plan the tasks they carry out to fulfill their missions. The missions' increasing complexity does not let human designers anticipate all the possible situations, so traditional control systems based on state machines are not enough. This paper contains a description of the ROS2 Planning System (PlanSys2 in short), a framework for symbolic planning that incorporates novel approaches for execution on robots working in demanding environments. PlanSys2 aims to be the reference task planning framework in ROS2, the latest version of the {\em de facto} standard in robotics software development. Among its main features, it can be highlighted the optimized execution, based on Behavior Trees, of plans through a new actions auction protocol and its multi-robot planning capabilities. It already has a small but growing community of users and developers, and this document is a summary of the design and capabilities of this project.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Kinodynamic Planning for an Energy-Efficient Autonomous Ornithopter
Authors:
Fabio Rodríguez,
José-Miguel Díaz-Báñez,
Ernesto Sanchez-Laulhe,
Jesús Capitán,
Aníbal Ollero
Abstract:
This paper presents a novel algorithm to plan energy-efficient trajectories for autonomous ornithopters. In general, trajectory optimization is quite a relevant problem for practical applications with \emph{Unmanned Aerial Vehicles} (UAVs). Even though the problem has been well studied for fixed and rotatory-wing vehicles, there are far fewer works exploring it for flapping-wing UAVs like ornithop…
▽ More
This paper presents a novel algorithm to plan energy-efficient trajectories for autonomous ornithopters. In general, trajectory optimization is quite a relevant problem for practical applications with \emph{Unmanned Aerial Vehicles} (UAVs). Even though the problem has been well studied for fixed and rotatory-wing vehicles, there are far fewer works exploring it for flapping-wing UAVs like ornithopters. These are of interest for many applications where long flight endurance, but also hovering capabilities are required. We propose an efficient approach to plan ornithopter trajectories that minimize energy consumption by combining gliding and flapping maneuvers. Our algorithm builds a tree of dynamically feasible trajectories and applies heuristic search for efficient online planning, using reference curves to guide the search and prune states. We present computational experiments to analyze and tune key parameters, as well as a comparison against a recent alternative probabilistic planning, showing best performance. Finally, we demonstrate how our algorithm can be used for planning perching maneuvers online.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Implementing AI-powered semantic character recognition in motor racing sports
Authors:
Jose David Fernández Rodríguez,
David Daniel Albarracín Molina,
Jesús Hormigo Cebolla
Abstract:
Oftentimes TV producers of motor-racing programs overlay visual and textual media to provide on-screen context about drivers, such as a driver's name, position or photo. Typically this is accomplished by a human producer who visually identifies the drivers on screen, manually toggling the contextual media associated to each one and coordinating with cameramen and other TV producers to keep the rac…
▽ More
Oftentimes TV producers of motor-racing programs overlay visual and textual media to provide on-screen context about drivers, such as a driver's name, position or photo. Typically this is accomplished by a human producer who visually identifies the drivers on screen, manually toggling the contextual media associated to each one and coordinating with cameramen and other TV producers to keep the racer in the shot while the contextual media is on screen. This labor-intensive and highly dedicated process is mostly suited to static overlays and makes it difficult to overlay contextual information about many drivers at the same time in short shots. This paper presents a system that largely automates these tasks and enables dynamic overlays using deep learning to track the drivers as they move on screen, without human intervention. This system is not merely theoretical, but an implementation has already been deployed during live races by a TV production company at Formula E races. We present the challenges faced during the implementation and discuss the implications. Additionally, we cover future applications and roadmap of this new technological development.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
Chapter: Vulnerability of Quantum Information Systems to Collective Manipulation
Authors:
Fernando J. Gómez-Ruiz,
Ferney J. Rodríguez,
Luis Quiroga,
Neil F. Johnson
Abstract:
The highly specialist terms `quantum computing' and `quantum information', together with the broader term `quantum technologies', now appear regularly in the mainstream media. While this is undoubtedly highly exciting for physicists and investors alike, a key question for society concerns such systems' vulnerabilities -- and in particular, their vulnerability to collective manipulation. Here we pr…
▽ More
The highly specialist terms `quantum computing' and `quantum information', together with the broader term `quantum technologies', now appear regularly in the mainstream media. While this is undoubtedly highly exciting for physicists and investors alike, a key question for society concerns such systems' vulnerabilities -- and in particular, their vulnerability to collective manipulation. Here we present and discuss a new form of vulnerability in such systems, that we have identified based on detailed many-body quantum mechanical calculations. The impact of this new vulnerability is that groups of adversaries can maximally disrupt these systems' global quantum state which will then jeopardize their quantum functionality. It will be almost impossible to detect these attacks since they do not change the Hamiltonian and the purity remains the same; they do not entail any real-time communication between the attackers; and they can last less than a second. We also argue that there can be an implicit amplification of such attacks because of the statistical character of modern non-state actor groups. A countermeasure could be to embed future quantum technologies within redundant classical networks. We purposely structure the discussion in this chapter so that the first sections are self-contained and can be read by non-specialists.
△ Less
Submitted 11 April, 2024; v1 submitted 25 January, 2019;
originally announced January 2019.
-
Creating Fair Models of Atherosclerotic Cardiovascular Disease Risk
Authors:
Stephen Pfohl,
Ben Marafino,
Adrien Coulet,
Fatima Rodriguez,
Latha Palaniappan,
Nigam H. Shah
Abstract:
Guidelines for the management of atherosclerotic cardiovascular disease (ASCVD) recommend the use of risk stratification models to identify patients most likely to benefit from cholesterol-lowering and other therapies. These models have differential performance across race and gender groups with inconsistent behavior across studies, potentially resulting in an inequitable distribution of beneficia…
▽ More
Guidelines for the management of atherosclerotic cardiovascular disease (ASCVD) recommend the use of risk stratification models to identify patients most likely to benefit from cholesterol-lowering and other therapies. These models have differential performance across race and gender groups with inconsistent behavior across studies, potentially resulting in an inequitable distribution of beneficial therapy. In this work, we leverage adversarial learning and a large observational cohort extracted from electronic health records (EHRs) to develop a "fair" ASCVD risk prediction model with reduced variability in error rates across groups. We empirically demonstrate that our approach is capable of aligning the distribution of risk predictions conditioned on the outcome across several groups simultaneously for models built from high-dimensional EHR data. We also discuss the relevance of these results in the context of the empirical trade-off between fairness and model performance.
△ Less
Submitted 14 June, 2019; v1 submitted 12 September, 2018;
originally announced September 2018.
-
Global Constraint Catalog, Volume II, Time-Series Constraints
Authors:
Ekaterina Arafailova,
Nicolas Beldiceanu,
Rémi Douence,
Mats Carlsson,
Pierre Flener,
María Andreína Francisco Rodríguez,
Justin Pearson,
Helmut Simonis
Abstract:
First this report presents a restricted set of finite transducers used to synthesise structural time-series constraints described by means of a multi-layered function composition scheme. Second it provides the corresponding synthesised catalogue of structural time-series constraints where each constraint is explicitly described in terms of automata with registers.
First this report presents a restricted set of finite transducers used to synthesise structural time-series constraints described by means of a multi-layered function composition scheme. Second it provides the corresponding synthesised catalogue of structural time-series constraints where each constraint is explicitly described in terms of automata with registers.
△ Less
Submitted 18 September, 2018; v1 submitted 26 September, 2016;
originally announced September 2016.
-
Fair anonymity for the Tor network
Authors:
Jesus Diaz,
David Arroyo,
Francisco B. Rodriguez
Abstract:
Current anonymizing networks have become an important tool for guaranteeing users' privacy. However, these platforms can be used to perform illegitimate actions, which sometimes makes service providers see traffic coming from these networks as a probable threat. In order to solve this problem, we propose to add support for fairness mechanisms to the Tor network. Specifically, by introducing a slig…
▽ More
Current anonymizing networks have become an important tool for guaranteeing users' privacy. However, these platforms can be used to perform illegitimate actions, which sometimes makes service providers see traffic coming from these networks as a probable threat. In order to solve this problem, we propose to add support for fairness mechanisms to the Tor network. Specifically, by introducing a slight modification to the key negotiation process with the entry and exit nodes, in the shape of group signatures. By means of these signatures, we set up an access control method to prevent misbehaving users to make use of the Tor network. Additionally, we establish a predefined method for denouncing illegitimate actions, which impedes the application of the proposed fairness mechanisms as a threat eroding users' privacy. As a direct consequence, traffic coming from Tor would be considered less suspicious by service providers.
△ Less
Submitted 15 December, 2014;
originally announced December 2014.
-
Una metodología para realizar Diferenciación Automática Anidada
Authors:
Juan Luis Valerdi,
Fernando Raul Rodriguez
Abstract:
En este trabajo se presenta una propuesta para realizar Diferenciación Automática Anidada utilizando cualquier biblioteca de Diferenciación Automática que permita sobrecarga de operadores. Para calcular las derivadas anidadas en una misma evaluación de la función, la cual se asume que sea analítica, se trabaja con el modo forward utilizando una nueva estructura llamada SuperAdouble, que garantiza…
▽ More
En este trabajo se presenta una propuesta para realizar Diferenciación Automática Anidada utilizando cualquier biblioteca de Diferenciación Automática que permita sobrecarga de operadores. Para calcular las derivadas anidadas en una misma evaluación de la función, la cual se asume que sea analítica, se trabaja con el modo forward utilizando una nueva estructura llamada SuperAdouble, que garantiza que se aplique correctamente la Diferenciación Automática y se calculen el valor y la derivada que se requiera.
This paper proposes a framework to apply Nested Automatic Differentiation using any library of Automatic Differentiation which allows operator overloading. To compute nested derivatives of a function while it is being evaluated, which is assumed to be analytic, a new structure called SuperAdouble is used in the forward mode. This new class guarantees the correct application of Automatic Differentiation to calculate the value and derivative of a function where is required.
△ Less
Submitted 19 May, 2014;
originally announced May 2014.
-
A Game Theory Interpretation for Multiple Access in Cognitive Radio Networks with Random Number of Secondary Users
Authors:
Oscar Filio Rodriguez,
Serguei Primak,
Valeri Kontorovich,
Abdallah Shami
Abstract:
In this paper a new multiple access algorithm for cognitive radio networks based on game theory is presented. We address the problem of a multiple access system where the number of users and their types are unknown. In order to do this, the framework is modelled as a non-cooperative Poisson game in which all the players are unaware of the total number of devices participating (population uncertain…
▽ More
In this paper a new multiple access algorithm for cognitive radio networks based on game theory is presented. We address the problem of a multiple access system where the number of users and their types are unknown. In order to do this, the framework is modelled as a non-cooperative Poisson game in which all the players are unaware of the total number of devices participating (population uncertainty). We propose a scheme where failed attempts to transmit (collisions) are penalized. In terms of this, we calculate the optimum penalization in mixed strategies. The proposed scheme conveys to a Nash equilibrium where a maximum in the possible throughput is achieved.
△ Less
Submitted 22 May, 2013;
originally announced May 2013.
-
Technical Report: CSVM Ecosystem
Authors:
Frédéric Rodriguez
Abstract:
The CSVM format is derived from CSV format and allows the storage of tabular like data with a limited but extensible amount of metadata. This approach could help computer scientists because all information needed to uses subsequently the data is included in the CSVM file and is particularly well suited for handling RAW data in a lot of scientific fields and to be used as a canonical format. The us…
▽ More
The CSVM format is derived from CSV format and allows the storage of tabular like data with a limited but extensible amount of metadata. This approach could help computer scientists because all information needed to uses subsequently the data is included in the CSVM file and is particularly well suited for handling RAW data in a lot of scientific fields and to be used as a canonical format. The use of CSVM has shown that it greatly facilitates: the data management independently of using databases; the data exchange; the integration of RAW data in dataflows or calculation pipes; the search for best practices in RAW data management. The efficiency of this format is closely related to its plasticity: a generic frame is given for all kind of data and the CSVM parsers don't make any interpretation of data types. This task is done by the application layer, so it is possible to use same format and same parser codes for a lot of purposes. In this document some implementation of CSVM format for ten years and in different laboratories are presented. Some programming examples are also shown: a Python toolkit for using the format, manipulating and querying is available. A first specification of this format (CSVM-1) is now defined, as well as some derivatives such as CSVM dictionaries used for data interchange. CSVM is an Open Format and could be used as a support for Open Data and long term conservation of RAW or unpublished data.
△ Less
Submitted 11 September, 2012;
originally announced September 2012.
-
Technical report: CSVM dictionaries
Authors:
Frédéric Rodriguez
Abstract:
CSVM (CSV with Metadata) is a simple file format for tabular data. The possible application domain is the same as typical spreadsheets files, but CSVM is well suited for long term storage and the inter-conversion of RAW data. CSVM embeds different levels for data, metadata and annotations in human readable format and flat ASCII files. As a proof of concept, Perl and Python toolkits were designed i…
▽ More
CSVM (CSV with Metadata) is a simple file format for tabular data. The possible application domain is the same as typical spreadsheets files, but CSVM is well suited for long term storage and the inter-conversion of RAW data. CSVM embeds different levels for data, metadata and annotations in human readable format and flat ASCII files. As a proof of concept, Perl and Python toolkits were designed in order to handle CSVM data and objects in workflows. These parsers can process CSVM files independently of data types, so it is possible to use same data format and parser for a lot of scientific purposes. CSVM-1 is the first version of CSVM specification, an extension of CSVM-1 for implementing a translation system between CSVM files is presented in this paper. The necessary data used to make the translation are also coded in another CSVM file. This particular kind of CSVM is called a CSVM dictionary, it is also readable by the current CSVM parser and it is fully supported by the Python toolkit. This report presents a proposal for CSVM dictionaries, a working example in chemistry, and some elements of Python toolkit usable to handle these files.
△ Less
Submitted 8 August, 2012;
originally announced August 2012.
-
Technical Report: CSVM format for scientific tabular data
Authors:
Gérôme Beyries,
Frédéric Rodriguez
Abstract:
The CSVM (CSV with metadata data) is issued from CSV format and used for storing experimental data, models, specifications. CSVM allows the storage of tabular data with a limited but extensible amount of metadata. This increases the exchange and long term use of RAW data because all information needed to use subsequently the data are included in the CSVM file. Basic CSVM files are readable by curr…
▽ More
The CSVM (CSV with metadata data) is issued from CSV format and used for storing experimental data, models, specifications. CSVM allows the storage of tabular data with a limited but extensible amount of metadata. This increases the exchange and long term use of RAW data because all information needed to use subsequently the data are included in the CSVM file. Basic CSVM files are readable by current tools (i.e. spreadsheets) for handling tables. Using full possibilities of concept, it is possible to deviate from a strict table and annotate also inside the data block. CSVM file are pure ASCII files and could provide a template for implementing best practices in handling raw data at a laboratory level, in exchange between data sources, in long term resources, or in collaborative processes particularly when different scientific fields are implied. In this document we describe the first (CSVM-1) release of CSVM format.
△ Less
Submitted 24 July, 2012;
originally announced July 2012.
-
Cryptanalysis of a one round chaos-based Substitution Permutation Network
Authors:
David Arroyo,
Jesus Diaz,
F. B. Rodriguez
Abstract:
The interleaving of chaos and cryptography has been the aim of a large set of works since the beginning of the nineties. Many encryption proposals have been introduced to improve conventional cryptography. However, many proposals possess serious problems according to the basic requirements for the secure exchange of information. In this paper we highlight some of the main problems of chaotic crypt…
▽ More
The interleaving of chaos and cryptography has been the aim of a large set of works since the beginning of the nineties. Many encryption proposals have been introduced to improve conventional cryptography. However, many proposals possess serious problems according to the basic requirements for the secure exchange of information. In this paper we highlight some of the main problems of chaotic cryptography by means of the analysis of a very recent chaotic cryptosystem based on a one round Substitution Permutation Network. More specifically, we show that it is not possible to avoid the security problems of that encryption architecture just by including a chaotic system as core of the derived encryption system.
△ Less
Submitted 2 April, 2012; v1 submitted 30 March, 2012;
originally announced March 2012.
-
A formal methodology for integral security design and verification of network protocols
Authors:
Jesus Diaz,
David Arroyo,
Francisco B. Rodriguez
Abstract:
We propose a methodology for verifying security properties of network protocols at design level. It can be separated in two main parts: context and requirements analysis and informal verification; and formal representation and procedural verification. It is an iterative process where the early steps are simpler than the last ones. Therefore, the effort required for detecting flaws is proportional…
▽ More
We propose a methodology for verifying security properties of network protocols at design level. It can be separated in two main parts: context and requirements analysis and informal verification; and formal representation and procedural verification. It is an iterative process where the early steps are simpler than the last ones. Therefore, the effort required for detecting flaws is proportional to the complexity of the associated attack. Thus, we avoid wasting valuable resources for simple flaws that can be detected early in the verification process. In order to illustrate the advantages provided by our methodology, we also analyze three real protocols.
△ Less
Submitted 6 September, 2012; v1 submitted 26 January, 2012;
originally announced January 2012.
-
A PCA-Based Super-Resolution Algorithm for Short Image Sequences
Authors:
Carlos Miravet,
Francisco B. Rodríguez
Abstract:
In this paper, we present a novel, learning-based, two-step super-resolution (SR) algorithm well suited to solve the specially demanding problem of obtaining SR estimates from short image sequences. The first step, devoted to increase the sampling rate of the incoming images, is performed by fitting linear combinations of functions generated from principal components (PC) to reproduce locally the…
▽ More
In this paper, we present a novel, learning-based, two-step super-resolution (SR) algorithm well suited to solve the specially demanding problem of obtaining SR estimates from short image sequences. The first step, devoted to increase the sampling rate of the incoming images, is performed by fitting linear combinations of functions generated from principal components (PC) to reproduce locally the sparse projected image data, and using these models to estimate image values at nodes of the high-resolution grid. PCs were obtained from local image patches sampled at sub-pixel level, which were generated in turn from a database of high-resolution images by application of a physically realistic observation model. Continuity between local image models is enforced by minimizing an adequate functional in the space of model coefficients. The second step, dealing with restoration, is performed by a linear filter with coefficients learned to restore residual interpolation artifacts in addition to low-resolution blurring, providing an effective coupling between both steps of the method. Results on a demanding five-image scanned sequence of graphics and text are presented, showing the excellent performance of the proposed method compared to several state-of-the-art two-step and Bayesian Maximum a Posteriori SR algorithms.
△ Less
Submitted 18 January, 2012;
originally announced January 2012.
-
Formal security analysis of registration protocols for interactive systems: a methodology and a case of study
Authors:
Jesus Diaz,
David Arroyo,
Francisco B. Rodriguez
Abstract:
In this work we present and formally analyze CHAT-SRP (CHAos based Tickets-Secure Registration Protocol), a protocol to provide interactive and collaborative platforms with a cryptographically robust solution to classical security issues. Namely, we focus on the secrecy and authenticity properties while keeping a high usability. In this sense, users are forced to blindly trust the system administr…
▽ More
In this work we present and formally analyze CHAT-SRP (CHAos based Tickets-Secure Registration Protocol), a protocol to provide interactive and collaborative platforms with a cryptographically robust solution to classical security issues. Namely, we focus on the secrecy and authenticity properties while keeping a high usability. In this sense, users are forced to blindly trust the system administrators and developers. Moreover, as far as we know, the use of formal methodologies for the verification of security properties of communication protocols isn't yet a common practice. We propose here a methodology to fill this gap, i.e., to analyse both the security of the proposed protocol and the pertinence of the underlying premises. In this concern, we propose the definition and formal evaluation of a protocol for the distribution of digital identities. Once distributed, these identities can be used to verify integrity and source of information. We base our security analysis on tools for automatic verification of security protocols widely accepted by the scientific community, and on the principles they are based upon. In addition, it is assumed perfect cryptographic primitives in order to focus the analysis on the exchange of protocol messages. The main property of our protocol is the incorporation of tickets, created using digests of chaos based nonces (numbers used only once) and users' personal data. Combined with a multichannel authentication scheme with some previous knowledge, these tickets provide security during the whole protocol by univocally linking each registering user with a single request. [..]
△ Less
Submitted 6 September, 2012; v1 submitted 5 January, 2012;
originally announced January 2012.
-
Contextual Information Retrieval based on Algorithmic Information Theory and Statistical Outlier Detection
Authors:
Rafael Martinez,
Manuel Cebrian,
Francisco de Borja Rodriguez,
David Camacho
Abstract:
The main contribution of this paper is to design an Information Retrieval (IR) technique based on Algorithmic Information Theory (using the Normalized Compression Distance- NCD), statistical techniques (outliers), and novel organization of data base structure. The paper shows how they can be integrated to retrieve information from generic databases using long (text-based) queries. Two important…
▽ More
The main contribution of this paper is to design an Information Retrieval (IR) technique based on Algorithmic Information Theory (using the Normalized Compression Distance- NCD), statistical techniques (outliers), and novel organization of data base structure. The paper shows how they can be integrated to retrieve information from generic databases using long (text-based) queries. Two important problems are analyzed in the paper. On the one hand, how to detect "false positives" when the distance among the documents is very low and there is actual similarity. On the other hand, we propose a way to structure a document database which similarities distance estimation depends on the length of the selected text. Finally, the experimental evaluations that have been carried out to study previous problems are shown.
△ Less
Submitted 27 November, 2007;
originally announced November 2007.
-
Evaluating the Impact of Information Distortion on Normalized Compression Distance
Authors:
Ana Granados,
Manuel Cebrian,
David Camacho,
Francisco de B. Rodriguez
Abstract:
In this paper we apply different techniques of information distortion on a set of classical books written in English. We study the impact that these distortions have upon the Kolmogorov complexity and the clustering by compression technique (the latter based on Normalized Compression Distance, NCD). We show how to decrease the complexity of the considered books introducing several modifications…
▽ More
In this paper we apply different techniques of information distortion on a set of classical books written in English. We study the impact that these distortions have upon the Kolmogorov complexity and the clustering by compression technique (the latter based on Normalized Compression Distance, NCD). We show how to decrease the complexity of the considered books introducing several modifications in them. We measure how the information contained in each book is maintained using a clustering error measure. We find experimentally that the best way to keep the clustering error is by means of modifications in the most frequent words. We explain the details of these information distortions and we compare with other kinds of modifications like random word distortions and unfrequent word distortions. Finally, some phenomenological explanations from the different empirical results that have been carried out are presented.
△ Less
Submitted 9 May, 2008; v1 submitted 26 November, 2007;
originally announced November 2007.
-
Accurate and robust image superresolution by neural processing of local image representations
Authors:
Carlos Miravet,
Francisco B. Rodriguez
Abstract:
Image superresolution involves the processing of an image sequence to generate a still image with higher resolution. Classical approaches, such as bayesian MAP methods, require iterative minimization procedures, with high computational costs. Recently, the authors proposed a method to tackle this problem, based on the use of a hybrid MLP-PNN architecture. In this paper, we present a novel superr…
▽ More
Image superresolution involves the processing of an image sequence to generate a still image with higher resolution. Classical approaches, such as bayesian MAP methods, require iterative minimization procedures, with high computational costs. Recently, the authors proposed a method to tackle this problem, based on the use of a hybrid MLP-PNN architecture. In this paper, we present a novel superresolution method, based on an evolution of this concept, to incorporate the use of local image models. A neural processing stage receives as input the value of model coefficients on local windows. The data dimensionality is firstly reduced by application of PCA. An MLP, trained on synthetic sequences with various amounts of noise, estimates the high-resolution image data. The effect of varying the dimension of the network input space is examined, showing a complex, structured behavior. Quantitative results are presented showing the accuracy and robustness of the proposed method.
△ Less
Submitted 3 October, 2005;
originally announced October 2005.
-
Dynamical Neural Network: Information and Topology
Authors:
David Dominguez,
Kostadin Koroutchev,
Eduardo Serrano,
Francisco B. Rodriguez
Abstract:
A neural network works as an associative memory device if it has large storage capacity and the quality of the retrieval is good enough. The learning and attractor abilities of the network both can be measured by the mutual information (MI), between patterns and retrieval states. This paper deals with a search for an optimal topology, of a Hebb network, in the sense of the maximal MI. We use sma…
▽ More
A neural network works as an associative memory device if it has large storage capacity and the quality of the retrieval is good enough. The learning and attractor abilities of the network both can be measured by the mutual information (MI), between patterns and retrieval states. This paper deals with a search for an optimal topology, of a Hebb network, in the sense of the maximal MI. We use small-world topology. The connectivity $γ$ ranges from an extremely diluted to the fully connected network; the randomness $ω$ ranges from purely local to completely random neighbors. It is found that, while stability implies an optimal $MI(γ,ω)$ at $γ_{opt}(ω)\to 0$, for the dynamics, the optimal topology holds at certain $γ_{opt}>0$ whenever $0\leqω<0.3$.
△ Less
Submitted 20 June, 2005;
originally announced June 2005.
-
A hybrid MLP-PNN architecture for fast image superresolution
Authors:
Carlos Miravet,
Francisco B. Rodriguez
Abstract:
Image superresolution methods process an input image sequence of a scene to obtain a still image with increased resolution. Classical approaches to this problem involve complex iterative minimization procedures, typically with high computational costs. In this paper is proposed a novel algorithm for super-resolution that enables a substantial decrease in computer load. First, a probabilistic neu…
▽ More
Image superresolution methods process an input image sequence of a scene to obtain a still image with increased resolution. Classical approaches to this problem involve complex iterative minimization procedures, typically with high computational costs. In this paper is proposed a novel algorithm for super-resolution that enables a substantial decrease in computer load. First, a probabilistic neural network architecture is used to perform a scattered-point interpolation of the image sequence data. The network kernel function is optimally determined for this problem by a multi-layer perceptron trained on synthetic data. Network parameters dependence on sequence noise level is quantitatively analyzed. This super-sampled image is spatially filtered to correct finite pixel size effects, to yield the final high-resolution estimate. Results on a real outdoor sequence are presented, showing the quality of the proposed method.
△ Less
Submitted 22 March, 2005;
originally announced March 2005.