Search | arXiv e-print repository

Exploring AI Capabilities in Participatory Budgeting within Smart Cities: The Case of Sao Paulo

Authors: Italo Alberto Sousa, Mariana Carvalho da Silva, Jorge Machado, José Carlos Vaz

Abstract: This research examines how Artificial Intelligence (AI) can improve participatory budgeting processes within smart cities. In response to challenges like declining civic participation and resource allocation conflicts, the study explores how online political participation can be improved by AI. It investigates the state capacity governments need to implement AI-enhanced participatory tools, consid… ▽ More This research examines how Artificial Intelligence (AI) can improve participatory budgeting processes within smart cities. In response to challenges like declining civic participation and resource allocation conflicts, the study explores how online political participation can be improved by AI. It investigates the state capacity governments need to implement AI-enhanced participatory tools, considering technological dependencies and vulnerabilities. It analyzes technological and administrative structures, actors, interests, and strategies to understand the dynamics of online political participation technologies in the case of Sao Paulo, Brazil. The study contributes to understanding how technological advancements can reshape participatory budgeting processes. In a broader sense, the research highlights how AI can transform participatory institutions by offering new tools for citizens and also for government officials in charge of participatory processes within smart cities. △ Less

Submitted 20 September, 2025; originally announced September 2025.

Comments: 22 pages, Presented at 28th IPSA World Congress of Political Science, Seoul 2025

arXiv:2509.16603 [pdf, ps, other]

An Octave-based Multi-Resolution CQT Architecture for Diffusion-based Audio Generation

Authors: Maurício do V. M. da Costa, Eloi Moliner

Abstract: This paper introduces MR-CQTdiff, a novel neural-network architecture for diffusion-based audio generation that leverages a multi-resolution Constant-$Q$ Transform (C$Q$T). The proposed architecture employs an efficient, invertible CQT framework that adjusts the time-frequency resolution on an octave-by-octave basis. This design addresses the issue of low temporal resolution at lower frequencies,… ▽ More This paper introduces MR-CQTdiff, a novel neural-network architecture for diffusion-based audio generation that leverages a multi-resolution Constant-$Q$ Transform (C$Q$T). The proposed architecture employs an efficient, invertible CQT framework that adjusts the time-frequency resolution on an octave-by-octave basis. This design addresses the issue of low temporal resolution at lower frequencies, enabling more flexible and expressive audio generation. We conduct an evaluation using the Fréchet Audio Distance (FAD) metric across various architectures and two datasets. Experimental results demonstrate that MR-CQTdiff achieves state-of-the-art audio quality, outperforming competing architectures. △ Less

Submitted 20 September, 2025; originally announced September 2025.

Comments: accepted at IEEE International Symposium on the Internet of Sounds

arXiv:2509.16300 [pdf, ps, other]

ROOT: Rethinking Offline Optimization as Distributional Translation via Probabilistic Bridge

Authors: Manh Cuong Dao, The Hung Tran, Phi Le Nguyen, Thao Nguyen Truong, Trong Nghia Hoang

Abstract: This paper studies the black-box optimization task which aims to find the maxima of a black-box function using a static set of its observed input-output pairs. This is often achieved via learning and optimizing a surrogate function with that offline data. Alternatively, it can also be framed as an inverse modeling task that maps a desired performance to potential input candidates that achieve it.… ▽ More This paper studies the black-box optimization task which aims to find the maxima of a black-box function using a static set of its observed input-output pairs. This is often achieved via learning and optimizing a surrogate function with that offline data. Alternatively, it can also be framed as an inverse modeling task that maps a desired performance to potential input candidates that achieve it. Both approaches are constrained by the limited amount of offline data. To mitigate this limitation, we introduce a new perspective that casts offline optimization as a distributional translation task. This is formulated as learning a probabilistic bridge transforming an implicit distribution of low-value inputs (i.e., offline data) into another distribution of high-value inputs (i.e., solution candidates). Such probabilistic bridge can be learned using low- and high-value inputs sampled from synthetic functions that resemble the target function. These synthetic functions are constructed as the mean posterior of multiple Gaussian processes fitted with different parameterizations on the offline data, alleviating the data bottleneck. The proposed approach is evaluated on an extensive benchmark comprising most recent methods, demonstrating significant improvement and establishing a new state-of-the-art performance. △ Less

Submitted 19 September, 2025; originally announced September 2025.

Comments: The first two authors contributed equally

Journal ref: NeurIPS 2025 Spotlight

arXiv:2509.01422 [pdf, ps, other]

Exploring Quantum Machine Learning for Weather Forecasting

Authors: Maria Heloísa F. da Silva, Gleydson F. de Jesus, Christiano M. S. Nascimento, Valéria L. da Silva, Clebson Cruz

Abstract: Weather forecasting plays a crucial role in supporting strategic decisions across various sectors, including agriculture, renewable energy production, and disaster management. However, the inherently dynamic and chaotic behavior of the atmosphere presents significant challenges to conventional predictive models. On the other hand, introducing quantum computing simulation techniques to the forecast… ▽ More Weather forecasting plays a crucial role in supporting strategic decisions across various sectors, including agriculture, renewable energy production, and disaster management. However, the inherently dynamic and chaotic behavior of the atmosphere presents significant challenges to conventional predictive models. On the other hand, introducing quantum computing simulation techniques to the forecasting problems constitutes a promising alternative to overcome these challenges. In this context, this work explores the emerging intersection between quantum machine learning (QML) and climate forecasting. We present the implementation of a Quantum Neural Network (QNN) trained on real meteorological data from NASA's Prediction of Worldwide Energy Resources (POWER) database. The results show that QNN has the potential to outperform a classical Recurrent Neural Network (RNN) in terms of accuracy and adaptability to abrupt data shifts, particularly in wind speed prediction. Despite observed nonlinearities and architectural sensitivities, the QNN demonstrated robustness in handling temporal variability and faster convergence in temperature prediction. These findings highlight the potential of quantum models in short and medium term climate prediction, while also revealing key challenges and future directions for optimization and broader applicability. △ Less

Submitted 1 September, 2025; originally announced September 2025.

arXiv:2508.18073 [pdf, ps, other]

Debian in the Research Software Ecosystem: A Bibliometric Analysis

Authors: Joenio Marques da Costa, Christina von Flach

Abstract: Context: The Debian system has historically participated in academic works and scientific projects, with well-known examples including NeuroDebian, Debian Med, Debsources, Debian Science, and Debian GIS, where the scientific relevance of Debian and its contribution to the Research Software ecosystem are evident. Objective: The objective of this study is to investigate the Debian system through a… ▽ More Context: The Debian system has historically participated in academic works and scientific projects, with well-known examples including NeuroDebian, Debian Med, Debsources, Debian Science, and Debian GIS, where the scientific relevance of Debian and its contribution to the Research Software ecosystem are evident. Objective: The objective of this study is to investigate the Debian system through academic publications, with the aim of classifying articles, mapping research, identifying trends, and finding opportunities. Method: The study is based on a bibliometric analysis starting with an initial search for the term "Debian" in the titles, abstracts, or keywords of academic publications, using the Scopus database. This analysis calculates metrics of co-citation, co-authorship, and word co-occurrence, and is guided by a set of research questions and criteria for inclusion and exclusion to conduct the bibliometric analysis. Results: The study includes a set of articles published across various fields of knowledge, providing a map of the academic publication space about Debian. The study's data will be available in a public repository, reporting demographic and bibliometric trends, including the most cited articles, active countries, researchers, and popular conferences. Conclusion: Results includes a bibliometric and demographic analysis identified in publications about Debian, shedding light on the intellectual structure of academic research. The results of the analyses can help researchers gain an overview of existing trends in publications about Debian and identify areas that require more attention from the scientific community. △ Less

Submitted 25 August, 2025; originally announced August 2025.

Comments: 5 pages; 3 figures; 2 tables; to be published in DebConf25 Academic Track https://www.diverse-team.fr/debconf25-academictrack

arXiv:2508.14994 [pdf, ps, other]

A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot

Authors: Murilo Vinicius da Silva, Matheus Hipolito Carvalho, Juliano Negri, Thiago Segreto, Gustavo J. G. Lahr, Ricardo V. Godoy, Marcelo Becker

Abstract: In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collisio… ▽ More In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collision risks in confined or dynamically changing workspaces. Teleoperation via joysticks or pads can be non-intuitive and demands a high level of expertise due to its complexity, culminating in a high cognitive load on the operator. To address this challenge, a teleoperation approach that directly maps human arm movements to the robotic manipulator offers a simpler and more accessible solution. This work proposes an intuitive remote control by leveraging a vision-based pose estimation pipeline that utilizes an external camera with a machine learning-based model to detect the operator's wrist position. The system maps these wrist movements into robotic arm commands to control the robot's arm in real-time. A trajectory planner ensures safe teleoperation by detecting and preventing collisions with both obstacles and the robotic arm itself. The system was validated on the real robot, demonstrating robust performance in real-time control. This teleoperation approach provides a cost-effective solution for industrial applications where safety, precision, and ease of use are paramount, ensuring reliable and intuitive robotic control in high-risk environments. △ Less

Submitted 20 August, 2025; originally announced August 2025.

arXiv:2508.09757 [pdf, ps, other]

NEUBORN: The Neurodevelopmental Evolution framework Using BiOmechanical RemodelliNg

Authors: Nashira Baena, Mariana da Silva, Irina Grigorescu, Aakash Saboo, Saga Masui, Jaques-Donald Tournier, Emma C. Robinson

Abstract: Understanding individual cortical development is essential for identifying deviations linked to neurodevelopmental disorders. However, current normative modelling frameworks struggle to capture fine-scale anatomical details due to their reliance on modelling data within a population-average reference space. Here, we present a novel framework for learning individual growth trajectories from biomech… ▽ More Understanding individual cortical development is essential for identifying deviations linked to neurodevelopmental disorders. However, current normative modelling frameworks struggle to capture fine-scale anatomical details due to their reliance on modelling data within a population-average reference space. Here, we present a novel framework for learning individual growth trajectories from biomechanically constrained, longitudinal, diffeomorphic image registration, implemented via a hierarchical network architecture. Trained on neonatal MRI data from the Developing Human Connectome Project, the method improves the biological plausibility of warps, generating growth trajectories that better follow population-level trends while generating smoother warps, with fewer negative Jacobians, relative to state-of-the-art baselines. The resulting subject-specific deformations provide interpretable, biologically grounded mappings of development. This framework opens new possibilities for predictive modeling of brain maturation and early identification of malformations of cortical development. △ Less

Submitted 13 August, 2025; originally announced August 2025.

arXiv:2508.08592 [pdf, ps, other]

Fact-Checking at Scale: Multimodal AI for Authenticity and Context Verification in Online Media

Authors: Van-Hoang Phan, Tung-Duong Le-Duc, Long-Khanh Pham, Anh-Thu Le, Quynh-Huong Dinh-Nguyen, Dang-Quan Vo, Hoang-Quoc Nguyen-Son, Anh-Duy Tran, Dang Vu, Minh-Son Dao

Abstract: The proliferation of multimedia content on social media platforms has dramatically transformed how information is consumed and disseminated. While this shift enables real-time coverage of global events, it also facilitates the rapid spread of misinformation and disinformation, especially during crises such as wars, natural disasters, or elections. The rise of synthetic media and the reuse of authe… ▽ More The proliferation of multimedia content on social media platforms has dramatically transformed how information is consumed and disseminated. While this shift enables real-time coverage of global events, it also facilitates the rapid spread of misinformation and disinformation, especially during crises such as wars, natural disasters, or elections. The rise of synthetic media and the reuse of authentic content in misleading contexts have intensified the need for robust multimedia verification tools. In this paper, we present a comprehensive system developed for the ACM Multimedia 2025 Grand Challenge on Multimedia Verification. Our system assesses the authenticity and contextual accuracy of multimedia content in multilingual settings and generates both expert-oriented verification reports and accessible summaries for the general public. We introduce a unified verification pipeline that integrates visual forensics, textual analysis, and multimodal reasoning, and propose a hybrid approach to detect out-of-context (OOC) media through semantic similarity, temporal alignment, and geolocation cues. Extensive evaluations on the Grand Challenge benchmark demonstrate the system's effectiveness across diverse real-world scenarios. Our contributions advance the state of the art in multimedia verification and offer practical tools for journalists, fact-checkers, and researchers confronting information integrity challenges in the digital age. △ Less

Submitted 13 August, 2025; v1 submitted 11 August, 2025; originally announced August 2025.

Comments: Accept to ACM MM 2025

arXiv:2508.06591 [pdf, ps, other]

Generative Artificial Intelligence Extracts Structure-Function Relationships from Plants for New Materials

Authors: Rachel K. Luu, Jingyu Deng, Mohammed Shahrudin Ibrahim, Nam-Joon Cho, Ming Dao, Subra Suresh, Markus J. Buehler

Abstract: Large language models (LLMs) have reshaped the research landscape by enabling new approaches to knowledge retrieval and creative ideation. Yet their application in discipline-specific experimental science, particularly in highly multi-disciplinary domains like materials science, remains limited. We present a first-of-its-kind framework that integrates generative AI with literature from hitherto-un… ▽ More Large language models (LLMs) have reshaped the research landscape by enabling new approaches to knowledge retrieval and creative ideation. Yet their application in discipline-specific experimental science, particularly in highly multi-disciplinary domains like materials science, remains limited. We present a first-of-its-kind framework that integrates generative AI with literature from hitherto-unconnected fields such as plant science, biomimetics, and materials engineering to extract insights and design experiments for materials. We focus on humidity-responsive systems such as pollen-based materials and Rhapis excelsa (broadleaf lady palm) leaves, which exhibit self-actuation and adaptive performance. Using a suite of AI tools, including a fine-tuned model (BioinspiredLLM), Retrieval-Augmented Generation (RAG), agentic systems, and a Hierarchical Sampling strategy, we extract structure-property relationships and translate them into new classes of bioinspired materials. Structured inference protocols generate and evaluate hundreds of hypotheses from a single query, surfacing novel and experimentally tractable ideas. We validate our approach through real-world implementation: LLM-generated procedures, materials designs, and mechanical predictions were tested in the laboratory, culminating in the fabrication of a novel pollen-based adhesive with tunable morphology and measured shear strength, establishing a foundation for future plant-derived adhesive design. This work demonstrates how AI-assisted ideation can drive real-world materials design and enable effective human-AI collaboration. △ Less

Submitted 8 August, 2025; originally announced August 2025.

arXiv:2507.17843 [pdf, ps, other]

doi 10.5753/wpeif.2025.8714

Optimizing Edge Gaming Slices through an Enhanced User Plane Function and Analytics in Beyond-5G Networks

Authors: Bruno Marques da Silva, Larissa Ferreira Rodrigues Moreira, Flávio de Oliveira Silva, Rodrigo Moreira

Abstract: The latest generation of games and pervasive communication technologies poses challenges in service management and Service-Level Agreement compliance for mobile users. State-of-the-art edge-gaming techniques enhance throughput, reduce latency, and leverage cloud computing. However, further development of core functions such as the User Plane Function (UPF) is needed for non-intrusive user latency… ▽ More The latest generation of games and pervasive communication technologies poses challenges in service management and Service-Level Agreement compliance for mobile users. State-of-the-art edge-gaming techniques enhance throughput, reduce latency, and leverage cloud computing. However, further development of core functions such as the User Plane Function (UPF) is needed for non-intrusive user latency measurement. This paper proposes a closed-loop architecture integrating the Network Data Analytics Function (NWDAF) and UPF to estimate user latency and enhance the 5G control plane by making it latency-aware. The results show that embedding an artificial intelligence model within NWDAF enables game classification and opens new avenues for mobile edge gaming research. △ Less

Submitted 23 July, 2025; originally announced July 2025.

arXiv:2507.01719 [pdf]

Towards culturally-appropriate conversational AI for health in the majority world: An exploratory study with citizens and professionals in Latin America

Authors: Dorian Peters, Fernanda Espinoza, Marco da Re, Guido Ivetta, Luciana Benotti, Rafael A. Calvo

Abstract: There is justifiable interest in leveraging conversational AI (CAI) for health across the majority world, but to be effective, CAI must respond appropriately within culturally and linguistically diverse contexts. Therefore, we need ways to address the fact that current LLMs exclude many lived experiences globally. Various advances are underway which focus on top-down approaches and increasing trai… ▽ More There is justifiable interest in leveraging conversational AI (CAI) for health across the majority world, but to be effective, CAI must respond appropriately within culturally and linguistically diverse contexts. Therefore, we need ways to address the fact that current LLMs exclude many lived experiences globally. Various advances are underway which focus on top-down approaches and increasing training data. In this paper, we aim to complement these with a bottom-up locally-grounded approach based on qualitative data collected during participatory workshops in Latin America. Our goal is to construct a rich and human-centred understanding of: a) potential areas of cultural misalignment in digital health; b) regional perspectives on chatbots for health and c)strategies for creating culturally-appropriate CAI; with a focus on the understudied Latin American context. Our findings show that academic boundaries on notions of culture lose meaning at the ground level and technologies will need to engage with a broader framework; one that encapsulates the way economics, politics, geography and local logistics are entangled in cultural experience. To this end, we introduce a framework for 'Pluriversal Conversational AI for Health' which allows for the possibility that more relationality and tolerance, rather than just more data, may be called for. △ Less

Submitted 2 July, 2025; originally announced July 2025.

arXiv:2506.20944 [pdf, ps, other]

E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs

Authors: Van-Hoang Phan, Long-Khanh Pham, Dang Vu, Anh-Duy Tran, Minh-Son Dao

Abstract: The rapid spread of misinformation in mobile and wireless networks presents critical security challenges. This study introduces a training-free, retrieval-based multimodal fact verification system that leverages pretrained vision-language models and large language models for credibility assessment. By dynamically retrieving and cross-referencing trusted data sources, our approach mitigates vulnera… ▽ More The rapid spread of misinformation in mobile and wireless networks presents critical security challenges. This study introduces a training-free, retrieval-based multimodal fact verification system that leverages pretrained vision-language models and large language models for credibility assessment. By dynamically retrieving and cross-referencing trusted data sources, our approach mitigates vulnerabilities of traditional training-based models, such as adversarial attacks and data poisoning. Additionally, its lightweight design enables seamless edge device integration without extensive on-device processing. Experiments on two fact-checking benchmarks achieve SOTA results, confirming its effectiveness in misinformation detection and its robustness against various attack vectors, highlighting its potential to enhance security in mobile and wireless communication environments. △ Less

Submitted 25 June, 2025; originally announced June 2025.

Comments: Accepted to AsiaCCS 2025 @ SCID

arXiv:2506.20531 [pdf, ps, other]

Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios

Authors: Wenbin Gan, Minh-Son Dao, Koji Zettsu

Abstract: Driving in safety-critical scenarios requires quick, context-aware decision-making grounded in both situational understanding and experiential reasoning. Large Language Models (LLMs), with their powerful general-purpose reasoning capabilities, offer a promising foundation for such decision-making. However, their direct application to autonomous driving remains limited due to challenges in domain a… ▽ More Driving in safety-critical scenarios requires quick, context-aware decision-making grounded in both situational understanding and experiential reasoning. Large Language Models (LLMs), with their powerful general-purpose reasoning capabilities, offer a promising foundation for such decision-making. However, their direct application to autonomous driving remains limited due to challenges in domain adaptation, contextual grounding, and the lack of experiential knowledge needed to make reliable and interpretable decisions in dynamic, high-risk environments. To address this gap, this paper presents a Case-Based Reasoning Augmented Large Language Model (CBR-LLM) framework for evasive maneuver decision-making in complex risk scenarios. Our approach integrates semantic scene understanding from dashcam video inputs with the retrieval of relevant past driving cases, enabling LLMs to generate maneuver recommendations that are both context-sensitive and human-aligned. Experiments across multiple open-source LLMs show that our framework improves decision accuracy, justification quality, and alignment with human expert behavior. Risk-aware prompting strategies further enhance performance across diverse risk types, while similarity-based case retrieval consistently outperforms random sampling in guiding in-context learning. Case studies further demonstrate the framework's robustness in challenging real-world conditions, underscoring its potential as an adaptive and trustworthy decision-support tool for intelligent driving systems. △ Less

Submitted 25 June, 2025; originally announced June 2025.

Comments: 12 pages, 10 figures, under-review conference

arXiv:2506.11180 [pdf, ps, other]

Beyond Formal Semantics for Capabilities and Skills: Model Context Protocol in Manufacturing

Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Felix Gehlhoff

Abstract: Explicit modeling of capabilities and skills -- whether based on ontologies, Asset Administration Shells, or other technologies -- requires considerable manual effort and often results in representations that are not easily accessible to Large Language Models (LLMs). In this work-in-progress paper, we present an alternative approach based on the recently introduced Model Context Protocol (MCP). MC… ▽ More Explicit modeling of capabilities and skills -- whether based on ontologies, Asset Administration Shells, or other technologies -- requires considerable manual effort and often results in representations that are not easily accessible to Large Language Models (LLMs). In this work-in-progress paper, we present an alternative approach based on the recently introduced Model Context Protocol (MCP). MCP allows systems to expose functionality through a standardized interface that is directly consumable by LLM-based agents. We conduct a prototypical evaluation on a laboratory-scale manufacturing system, where resource functions are made available via MCP. A general-purpose LLM is then tasked with planning and executing a multi-step process, including constraint handling and the invocation of resource functions via MCP. The results indicate that such an approach can enable flexible industrial automation without relying on explicit semantic models. This work lays the basis for further exploration of external tool integration in LLM-driven production systems. △ Less

Submitted 12 June, 2025; originally announced June 2025.

arXiv:2506.05631 [pdf, ps, other]

The TESS Ten Thousand Catalog: 10,001 uniformly-vetted and -validated Eclipsing Binary Stars detected in Full-Frame Image data by machine learning and analyzed by citizen scientists

Authors: Veselin B. Kostov, Brian P. Powell, Aline U. Fornear, Marco Z. Di Fraia, Robert Gagliano, Thomas L. Jacobs, Julien S. de Lambilly, Hugo A. Durantini Luca, Steven R. Majewski, Mark Omohundro, Jerome Orosz, Saul A. Rappaport, Ryan Salik, Donald Short, William Welsh, Svetoslav Alexandrov, Cledison Marcos da Silva, Erika Dunning, Gerd Guhne, Marc Huten, Michiharu Hyogo, Davide Iannone, Sam Lee, Christian Magliano, Manya Sharma , et al. (14 additional authors not shown)

Abstract: The Transiting Exoplanet Survey Satellite (TESS) has surveyed nearly the entire sky in Full-Frame Image mode with a time resolution of 200 seconds to 30 minutes and a temporal baseline of at least 27 days. In addition to the primary goal of discovering new exoplanets, TESS is exceptionally capable at detecting variable stars, and in particular short-period eclipsing binaries which are relatively c… ▽ More The Transiting Exoplanet Survey Satellite (TESS) has surveyed nearly the entire sky in Full-Frame Image mode with a time resolution of 200 seconds to 30 minutes and a temporal baseline of at least 27 days. In addition to the primary goal of discovering new exoplanets, TESS is exceptionally capable at detecting variable stars, and in particular short-period eclipsing binaries which are relatively common, making up a few percent of all stars, and represent powerful astrophysical laboratories for deep investigations of stellar formation and evolution. We combed Sectors 1-82 of TESS Full-Frame Image data searching for eclipsing binary stars using a neural network that identified ~1.2 million stars with eclipse-like features. Of these, we have performed an in-depth analysis on ~60,000 targets using automated methods and manual inspection by citizen scientists. Here we present a catalog of 10001 uniformly-vetted and -validated eclipsing binary stars that passed all our ephemeris and photocenter tests, as well as complementary visual inspection. Of these, 7936 are new eclipsing binaries while the remaining 2065 are known systems for which we update the published ephemerides. We outline the detection and analysis of the targets, discuss the properties of the sample, and highlight potentially interesting systems. Finally, we also provide a list of ~900,000 unvetted and unvalidated targets for which the neural network found eclipse-like features with a score higher than 0.9, and for which there are no known eclipsing binaries within a sky-projected separation of a TESS pixel (~21 arcsec). △ Less

Submitted 5 June, 2025; originally announced June 2025.

Comments: 40 pages, 39 figures, 4 tables

arXiv:2505.23879 [pdf, other]

CNN-LSTM Hybrid Model for AI-Driven Prediction of COVID-19 Severity from Spike Sequences and Clinical Data

Authors: Caio Cheohen, Vinnícius M. S. Gomes, Manuela L. da Silva

Abstract: The COVID-19 pandemic, caused by SARS-CoV-2, highlighted the critical need for accurate prediction of disease severity to optimize healthcare resource allocation and patient management. The spike protein, which facilitates viral entry into host cells, exhibits high mutation rates, particularly in the receptor-binding domain, influencing viral pathogenicity. Artificial intelligence approaches, such… ▽ More The COVID-19 pandemic, caused by SARS-CoV-2, highlighted the critical need for accurate prediction of disease severity to optimize healthcare resource allocation and patient management. The spike protein, which facilitates viral entry into host cells, exhibits high mutation rates, particularly in the receptor-binding domain, influencing viral pathogenicity. Artificial intelligence approaches, such as deep learning, offer promising solutions for leveraging genomic and clinical data to predict disease outcomes. Objective: This study aimed to develop a hybrid CNN-LSTM deep learning model to predict COVID-19 severity using spike protein sequences and associated clinical metadata from South American patients. Methods: We retrieved 9,570 spike protein sequences from the GISAID database, of which 3,467 met inclusion criteria after standardization. The dataset included 2,313 severe and 1,154 mild cases. A feature engineering pipeline extracted features from sequences, while demographic and clinical variables were one-hot encoded. A hybrid CNN-LSTM architecture was trained, combining CNN layers for local pattern extraction and an LSTM layer for long-term dependency modeling. Results: The model achieved an F1 score of 82.92%, ROC-AUC of 0.9084, precision of 83.56%, and recall of 82.85%, demonstrating robust classification performance. Training stabilized at 85% accuracy with minimal overfitting. The most prevalent lineages (P.1, AY.99.2) and clades (GR, GK) aligned with regional epidemiological trends, suggesting potential associations between viral genetics and clinical outcomes. Conclusion: The CNN-LSTM hybrid model effectively predicted COVID-19 severity using spike protein sequences and clinical data, highlighting the utility of AI in genomic surveillance and precision public health. Despite limitations, this approach provides a framework for early severity prediction in future outbreaks. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: 12 pages, 4 figures, 4 tables

MSC Class: 68T07; 62P10; 92C50; 68T05 ACM Class: I.2.6; I.5.1; J.3

arXiv:2505.14714 [pdf, ps, other]

KGAlign: Joint Semantic-Structural Knowledge Encoding for Multimodal Fake News Detection

Authors: Tuan-Vinh La, Minh-Hieu Nguyen, Minh-Son Dao

Abstract: Fake news detection remains a challenging problem due to the complex interplay between textual misinformation, manipulated images, and external knowledge reasoning. While existing approaches have achieved notable results in verifying veracity and cross-modal consistency, two key challenges persist: (1) Existing methods often consider only the global image context while neglecting local object-leve… ▽ More Fake news detection remains a challenging problem due to the complex interplay between textual misinformation, manipulated images, and external knowledge reasoning. While existing approaches have achieved notable results in verifying veracity and cross-modal consistency, two key challenges persist: (1) Existing methods often consider only the global image context while neglecting local object-level details, and (2) they fail to incorporate external knowledge and entity relationships for deeper semantic understanding. To address these challenges, we propose a novel multi-modal fake news detection framework that integrates visual, textual, and knowledge-based representations. Our approach leverages bottom-up attention to capture fine-grained object details, CLIP for global image semantics, and RoBERTa for context-aware text encoding. We further enhance knowledge utilization by retrieving and adaptively selecting relevant entities from a knowledge graph. The fused multi-modal features are processed through a Transformer-based classifier to predict news veracity. Experimental results demonstrate that our model outperforms recent approaches, showcasing the effectiveness of neighbor selection mechanism and multi-modal fusion for fake news detection. Our proposal introduces a new paradigm: knowledge-grounded multimodal reasoning. By integrating explicit entity-level selection and NLI-guided filtering, we shift fake news detection from feature fusion to semantically grounded verification. For reproducibility and further research, the source code is publicly at \href{https://github.com/latuanvinh1998/KGAlign}{github.com/latuanvinh1998/KGAlign}. △ Less

Submitted 18 May, 2025; originally announced May 2025.

arXiv:2505.05409 [pdf, other]

Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It

Authors: Marvin F. da Silva, Felix Dangel, Sageev Oore

Abstract: The concept of sharpness has been successfully applied to traditional architectures like MLPs and CNNs to predict their generalization. For transformers, however, recent work reported weak correlation between flatness and generalization. We argue that existing sharpness measures fail for transformers, because they have much richer symmetries in their attention mechanism that induce directions in p… ▽ More The concept of sharpness has been successfully applied to traditional architectures like MLPs and CNNs to predict their generalization. For transformers, however, recent work reported weak correlation between flatness and generalization. We argue that existing sharpness measures fail for transformers, because they have much richer symmetries in their attention mechanism that induce directions in parameter space along which the network or its loss remain identical. We posit that sharpness must account fully for these symmetries, and thus we redefine it on a quotient manifold that results from quotienting out the transformer symmetries, thereby removing their ambiguities. Leveraging tools from Riemannian geometry, we propose a fully general notion of sharpness, in terms of a geodesic ball on the symmetry-corrected quotient manifold. In practice, we need to resort to approximating the geodesics. Doing so up to first order yields existing adaptive sharpness measures, and we demonstrate that including higher-order terms is crucial to recover correlation with generalization. We present results on diagonal networks with synthetic data, and show that our geodesic sharpness reveals strong correlation for real-world transformers on both text and image classification tasks. △ Less

Submitted 8 May, 2025; originally announced May 2025.

arXiv:2505.03295 [pdf, other]

Capability-Driven Skill Generation with LLMs: A RAG-Based Approach for Reusing Existing Libraries and Interfaces

Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Nicolas König, Felix Gehlhoff, Alexander Fay

Abstract: Modern automation systems increasingly rely on modular architectures, with capabilities and skills as one solution approach. Capabilities define the functions of resources in a machine-readable form and skills provide the concrete implementations that realize those capabilities. However, the development of a skill implementation conforming to a corresponding capability remains a time-consuming and… ▽ More Modern automation systems increasingly rely on modular architectures, with capabilities and skills as one solution approach. Capabilities define the functions of resources in a machine-readable form and skills provide the concrete implementations that realize those capabilities. However, the development of a skill implementation conforming to a corresponding capability remains a time-consuming and challenging task. In this paper, we present a method that treats capabilities as contracts for skill implementations and leverages large language models to generate executable code based on natural language user input. A key feature of our approach is the integration of existing software libraries and interface technologies, enabling the generation of skill implementations across different target languages. We introduce a framework that allows users to incorporate their own libraries and resource interfaces into the code generation process through a retrieval-augmented generation architecture. The proposed method is evaluated using an autonomous mobile robot controlled via Python and ROS 2, demonstrating the feasibility and flexibility of the approach. △ Less

Submitted 6 May, 2025; originally announced May 2025.

arXiv:2504.07437 [pdf, ps, other]

Unifying and extending Diffusion Models through PDEs for solving Inverse Problems

Authors: Agnimitra Dasgupta, Alexsander Marciano da Cunha, Ali Fardisi, Mehrnegar Aminy, Brianna Binder, Bryan Shaddy, Assad A Oberai

Abstract: Diffusion models have emerged as powerful generative tools with applications in computer vision and scientific machine learning (SciML), where they have been used to solve large-scale probabilistic inverse problems. Traditionally, these models have been derived using principles of variational inference, denoising, statistical signal processing, and stochastic differential equations. In contrast to… ▽ More Diffusion models have emerged as powerful generative tools with applications in computer vision and scientific machine learning (SciML), where they have been used to solve large-scale probabilistic inverse problems. Traditionally, these models have been derived using principles of variational inference, denoising, statistical signal processing, and stochastic differential equations. In contrast to the conventional presentation, in this study we derive diffusion models using ideas from linear partial differential equations and demonstrate that this approach has several benefits that include a constructive derivation of the forward and reverse processes, a unified derivation of multiple formulations and sampling strategies, and the discovery of a new class of variance preserving models. We also apply the conditional version of these models to solve canonical conditional density estimation problems and challenging inverse problems. These problems help establish benchmarks for systematically quantifying the performance of different formulations and sampling strategies in this study and for future studies. Finally, we identify and implement a mechanism through which a single diffusion model can be applied to measurements obtained from multiple measurement operators. Taken together, the contents of this manuscript provide a new understanding of and several new directions in the application of diffusion models to solving physics-based inverse problems. △ Less

Submitted 3 June, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

arXiv:2503.14514 [pdf, other]

Acceptance or Rejection of Lots while Minimizing and Controlling Type I and Type II Errors

Authors: Edson Luiz Ursini, Elaine Cristina Catapani Poletti, Loreno Menezes da Silveira, José Roberto Emiliano Leite

Abstract: The double hypothesis test (DHT) is a test that allows controlling Type I (producer) and Type II (consumer) errors. It is possible to say whether the batch has a defect rate, p, between 1.5 and 2%, or between 2 and 5%, or between 5 and 10%, and so on, until finding a required value for this probability. Using the two probabilities side by side, the Type I error for the lower probability distributi… ▽ More The double hypothesis test (DHT) is a test that allows controlling Type I (producer) and Type II (consumer) errors. It is possible to say whether the batch has a defect rate, p, between 1.5 and 2%, or between 2 and 5%, or between 5 and 10%, and so on, until finding a required value for this probability. Using the two probabilities side by side, the Type I error for the lower probability distribution and the Type II error for the higher probability distribution, both can be controlled and minimized. It can be applied in the development or manufacturing process of a batch of components, or in the case of purchasing from a supplier, when the percentage of defects (p) is unknown, considering the technology and/or process available to obtain them. The power of the test is amplified by the joint application of the Limit of Successive Failures (LSF) related to the Renewal Theory. To enable the choice of the most appropriate algorithm for each application. Four distributions are proposed for the Bernoulli event sequence, including their computational efforts: Binomial, Binomial approximated by Poisson, and Binomial approximated by Gaussian (with two variants). Fuzzy logic rules are also applied to facilitate decision-making. △ Less

Submitted 11 March, 2025; originally announced March 2025.

arXiv:2503.04242 [pdf, other]

Incorporating Surrogate Gradient Norm to Improve Offline Optimization Techniques

Authors: Manh Cuong Dao, Phi Le Nguyen, Thao Nguyen Truong, Trong Nghia Hoang

Abstract: Offline optimization has recently emerged as an increasingly popular approach to mitigate the prohibitively expensive cost of online experimentation. The key idea is to learn a surrogate of the black-box function that underlines the target experiment using a static (offline) dataset of its previous input-output queries. Such an approach is, however, fraught with an out-of-distribution issue where… ▽ More Offline optimization has recently emerged as an increasingly popular approach to mitigate the prohibitively expensive cost of online experimentation. The key idea is to learn a surrogate of the black-box function that underlines the target experiment using a static (offline) dataset of its previous input-output queries. Such an approach is, however, fraught with an out-of-distribution issue where the learned surrogate becomes inaccurate outside the offline data regimes. To mitigate this, existing offline optimizers have proposed numerous conditioning techniques to prevent the learned surrogate from being too erratic. Nonetheless, such conditioning strategies are often specific to particular surrogate or search models, which might not generalize to a different model choice. This motivates us to develop a model-agnostic approach instead, which incorporates a notion of model sharpness into the training loss of the surrogate as a regularizer. Our approach is supported by a new theoretical analysis demonstrating that reducing surrogate sharpness on the offline dataset provably reduces its generalized sharpness on unseen data. Our analysis extends existing theories from bounding generalized prediction loss (on unseen data) with loss sharpness to bounding the worst-case generalized surrogate sharpness with its empirical estimate on training data, providing a new perspective on sharpness regularization. Our extensive experimentation on a diverse range of optimization tasks also shows that reducing surrogate sharpness often leads to significant improvement, marking (up to) a noticeable 9.6% performance boost. Our code is publicly available at https://github.com/cuong-dm/IGNITE △ Less

Submitted 6 March, 2025; originally announced March 2025.

Journal ref: The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

arXiv:2503.04181 [pdf, other]

Boosting Offline Optimizers with Surrogate Sensitivity

Authors: Manh Cuong Dao, Phi Le Nguyen, Thao Nguyen Truong, Trong Nghia Hoang

Abstract: Offline optimization is an important task in numerous material engineering domains where online experimentation to collect data is too expensive and needs to be replaced by an in silico maximization of a surrogate of the black-box function. Although such a surrogate can be learned from offline data, its prediction might not be reliable outside the offline data regime, which happens when the surrog… ▽ More Offline optimization is an important task in numerous material engineering domains where online experimentation to collect data is too expensive and needs to be replaced by an in silico maximization of a surrogate of the black-box function. Although such a surrogate can be learned from offline data, its prediction might not be reliable outside the offline data regime, which happens when the surrogate has narrow prediction margin and is (therefore) sensitive to small perturbations of its parameterization. This raises the following questions: (1) how to regulate the sensitivity of a surrogate model; and (2) whether conditioning an offline optimizer with such less sensitive surrogate will lead to better optimization performance. To address these questions, we develop an optimizable sensitivity measurement for the surrogate model, which then inspires a sensitivity-informed regularizer that is applicable to a wide range of offline optimizers. This development is both orthogonal and synergistic to prior research on offline optimization, which is demonstrated in our extensive experiment benchmark. △ Less

Submitted 6 March, 2025; originally announced March 2025.

Journal ref: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:10072-10090, 2024

arXiv:2502.14156 [pdf, ps, other]

Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration

Authors: Katie Z Luo, Minh-Quan Dao, Zhenzhen Liu, Mark Campbell, Wei-Lun Chao, Kilian Q. Weinberger, Ezio Malis, Vincent Fremont, Bharath Hariharan, Mao Shan, Stewart Worrall, Julie Stephany Berrio Perez

Abstract: Vehicle-to-everything (V2X) collaborative perception has emerged as a promising solution to address the limitations of single-vehicle perception systems. However, existing V2X datasets are limited in scope, diversity, and quality. To address these gaps, we present Mixed Signals, a comprehensive V2X dataset featuring 45.1k point clouds and 240.6k bounding boxes collected from three connected autono… ▽ More Vehicle-to-everything (V2X) collaborative perception has emerged as a promising solution to address the limitations of single-vehicle perception systems. However, existing V2X datasets are limited in scope, diversity, and quality. To address these gaps, we present Mixed Signals, a comprehensive V2X dataset featuring 45.1k point clouds and 240.6k bounding boxes collected from three connected autonomous vehicles (CAVs) equipped with two different configurations of LiDAR sensors, plus a roadside unit with dual LiDARs. Our dataset provides point clouds and bounding box annotations across 10 classes, ensuring reliable data for perception training. We provide detailed statistical analysis on the quality of our dataset and extensively benchmark existing V2X methods on it. The Mixed Signals dataset is ready-to-use, with precise alignment and consistent annotations across time and viewpoints. Dataset website is available at https://mixedsignalsdataset.cs.cornell.edu/. △ Less

Submitted 28 August, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

Comments: International Conference on Computer Vision, ICCV 2025

arXiv:2502.06114 [pdf, other]

Enhanced 3D Object Detection via Diverse Feature Representations of 4D Radar Tensor

Authors: Seung-Hyun Song, Dong-Hee Paek, Minh-Quan Dao, Ezio Malis, Seung-Hyun Kong

Abstract: Recent advances in automotive four-dimensional (4D) Radar have enabled access to raw 4D Radar Tensor (4DRT), offering richer spatial and Doppler information than conventional point clouds. While most existing methods rely on heavily pre-processed, sparse Radar data, recent attempts to leverage raw 4DRT face high computational costs and limited scalability. To address these limitations, we propose… ▽ More Recent advances in automotive four-dimensional (4D) Radar have enabled access to raw 4D Radar Tensor (4DRT), offering richer spatial and Doppler information than conventional point clouds. While most existing methods rely on heavily pre-processed, sparse Radar data, recent attempts to leverage raw 4DRT face high computational costs and limited scalability. To address these limitations, we propose a novel three-dimensional (3D) object detection framework that maximizes the utility of 4DRT while preserving efficiency. Our method introduces a multi-teacher knowledge distillation (KD), where multiple teacher models are trained on point clouds derived from diverse 4DRT pre-processing techniques, each capturing complementary signal characteristics. These teacher representations are fused via a dedicated aggregation module and distilled into a lightweight student model that operates solely on a sparse Radar input. Experimental results on the K-Radar dataset demonstrate that our framework achieves improvements of 7.3% in AP_3D and 9.5% in AP_BEV over the baseline RTNH model when using extremely sparse inputs. Furthermore, it attains comparable performance to denser-input baselines while significantly reducing the input data size by about 90 times, confirming the scalability and efficiency of our approach. △ Less

Submitted 23 May, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

Comments: Arxiv preprint version

arXiv:2501.18671 [pdf, other]

Machine Learning Strategies for Parkinson Tremor Classification Using Wearable Sensor Data

Authors: Jesus Paucar-Escalante, Matheus Alves da Silva, Bruno De Lima Sanches, Aurea Soriano-Vargas, Laura Silveira Moriyama, Esther Luna Colombini

Abstract: Parkinson's disease (PD) is a neurological disorder requiring early and accurate diagnosis for effective management. Machine learning (ML) has emerged as a powerful tool to enhance PD classification and diagnostic accuracy, particularly by leveraging wearable sensor data. This survey comprehensively reviews current ML methodologies used in classifying Parkinsonian tremors, evaluating various tremo… ▽ More Parkinson's disease (PD) is a neurological disorder requiring early and accurate diagnosis for effective management. Machine learning (ML) has emerged as a powerful tool to enhance PD classification and diagnostic accuracy, particularly by leveraging wearable sensor data. This survey comprehensively reviews current ML methodologies used in classifying Parkinsonian tremors, evaluating various tremor data acquisition methodologies, signal preprocessing techniques, and feature selection methods across time and frequency domains, highlighting practical approaches for tremor classification. The survey explores ML models utilized in existing studies, ranging from traditional methods such as Support Vector Machines (SVM) and Random Forests to advanced deep learning architectures like Convolutional Neural Networks (CNN) and Long Short-Term Memory networks (LSTM). We assess the efficacy of these models in classifying tremor patterns associated with PD, considering their strengths and limitations. Furthermore, we discuss challenges and discrepancies in current research and broader challenges in applying ML to PD diagnosis using wearable sensor data. We also outline future research directions to advance ML applications in PD diagnostics, providing insights for researchers and practitioners. △ Less

Submitted 30 January, 2025; originally announced January 2025.

Comments: 28 pages, 9 figures, 3 tables, Journal Artificial Intelligence In Medicine

arXiv:2501.13083 [pdf, other]

Boosting MCTS with Free Energy Minimization

Authors: Mawaba Pascal Dao, Adrian M. Peter

Abstract: Active Inference, grounded in the Free Energy Principle, provides a powerful lens for understanding how agents balance exploration and goal-directed behavior in uncertain environments. Here, we propose a new planning framework, that integrates Monte Carlo Tree Search (MCTS) with active inference objectives to systematically reduce epistemic uncertainty while pursuing extrinsic rewards. Our key ins… ▽ More Active Inference, grounded in the Free Energy Principle, provides a powerful lens for understanding how agents balance exploration and goal-directed behavior in uncertain environments. Here, we propose a new planning framework, that integrates Monte Carlo Tree Search (MCTS) with active inference objectives to systematically reduce epistemic uncertainty while pursuing extrinsic rewards. Our key insight is that MCTS already renowned for its search efficiency can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain. Concretely, the Cross-Entropy Method (CEM) is used to optimize action proposals at the root node, while tree expansions leverage reward modeling alongside intrinsic exploration bonuses. This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability. Empirically, we benchmark our planner on a diverse set of continuous control tasks, where it demonstrates performance gains over both standalone CEM and MCTS with random rollouts. △ Less

Submitted 22 January, 2025; originally announced January 2025.

arXiv:2501.04267 [pdf, other]

A 5G-Edge Architecture for Computational Offloading of Computer Vision Applications

Authors: Marcelo V. B. da Silva, Maria Barbosa, Anderson Queiroz, Kelvin L. Dias

Abstract: Processing computer vision applications (CVA) on mobile devices is challenging due to limited battery life and computing power. While cloud-based remote processing of CVA offers abundant computational resources, it introduces latency issues that can hinder real-time applications. To overcome this problem, computational offloading to edge servers has been adopted by industry and academic research.… ▽ More Processing computer vision applications (CVA) on mobile devices is challenging due to limited battery life and computing power. While cloud-based remote processing of CVA offers abundant computational resources, it introduces latency issues that can hinder real-time applications. To overcome this problem, computational offloading to edge servers has been adopted by industry and academic research. Furthermore, 5G access can also benefit CVA with lower latency and higher bandwidth than previous cellular generations. As the number of Mobile Operators and Internet Service providers relying on 5G access is growing, it is of paramount importance to elaborate a solution for supporting real time applications with the assistance of the edge computing. Besides that, open-source based platforms for Multi-access Edge Computing (MEC) and 5G core can be deployed to rapid prototyping and testing applications. This paper aims at providing an end-to-end solution of open-source MEC and 5G Core platforms along with a commercial 5G Radio. We first conceived a 5G-edge computing environment to assist near to user processing of computer vision applications. Then a sentiment analysis application is developed and integrated to the proposed 5G-Edge architecture. Finally, we conducted a performance evaluation of the proposed solution and compare it against a remote cloud-based approach in order to highlight the benefits of our proposal. The proposed architecture achieved a 260\% throughput performance increase and reduced response time by 71.3\% compared to the remote-cloud-based offloading. △ Less

Submitted 7 January, 2025; originally announced January 2025.

Comments: Accept on conference the 39th International Conference on Information Networking (ICOIN 2025): 6 pages, 8 figures, 1 table

arXiv:2412.09644 [pdf, other]

doi 10.1109/BIBM62325.2024.10821991

Combining knowledge graphs and LLMs for hazardous chemical information management and reuse

Authors: Marcos Da Silveira, Louis Deladiennee, Kheira Acem, Oona Freudenthal

Abstract: Human health is increasingly threatened by exposure to hazardous substances, particularly persistent and toxic chemicals. The link between these substances, often encountered in complex mixtures, and various diseases are demonstrated in scientific studies. However, this information is scattered across several sources and hardly accessible by humans and machines. This paper evaluates current practi… ▽ More Human health is increasingly threatened by exposure to hazardous substances, particularly persistent and toxic chemicals. The link between these substances, often encountered in complex mixtures, and various diseases are demonstrated in scientific studies. However, this information is scattered across several sources and hardly accessible by humans and machines. This paper evaluates current practices for publishing/accessing information on hazardous chemicals and proposes a novel platform designed to facilitate retrieval of critical chemical data in urgent situations. The platform aggregates information from multiple sources and organizes it into a structured knowledge graph. Users can access this information through a visual interface such as Neo4J Bloom and dashboards, or via natural language queries using a Chatbot. Our findings demonstrate a significant reduction in the time and effort required to access vital chemical information when datasets follow FAIR principles. Furthermore, we discuss the lessons learned from the development and implementation of this platform and provide recommendations for data owners and publishers to enhance data reuse and interoperability. This work aims to improve the accessibility and usability of chemical information by healthcare professionals, thereby supporting better health outcomes and informed decision-making in the face of patients exposed to chemical intoxication risks. △ Less

Submitted 10 December, 2024; originally announced December 2024.

Comments: Submitted to IEEE BIBM24

ACM Class: H.4; J.3

Journal ref: 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

arXiv:2412.09153 [pdf, other]

Branch Sequentialization in Quantum Polytime

Authors: Emmanuel Hainry, Romain Péchoux, Mário Alberto Machado da Silva

Abstract: Quantum computation leverages the use of quantumly-controlled conditionals in order to achieve computational advantage. However, since the different branches in the conditional may operate on the same qubits, a typical approach to compilation involves performing the branches sequentially, which can easily lead to an exponential blowup of the program complexity. We introduce and study a compilation… ▽ More Quantum computation leverages the use of quantumly-controlled conditionals in order to achieve computational advantage. However, since the different branches in the conditional may operate on the same qubits, a typical approach to compilation involves performing the branches sequentially, which can easily lead to an exponential blowup of the program complexity. We introduce and study a compilation technique for avoiding branch sequentialization in a language that is sound and complete for quantum polynomial time, improving on previously existing polynomial size bounds and showing the existence of techniques that preserve the intuitive complexity of the program. △ Less

Submitted 28 February, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

arXiv:2410.16331 [pdf, other]

Exploring Quantum Neural Networks for Demand Forecasting

Authors: Gleydson Fernandes de Jesus, Maria Heloísa Fraga da Silva, Otto Menegasso Pires, Lucas Cruz da Silva, Clebson dos Santos Cruz, Valéria Loureiro da Silva

Abstract: Forecasting demand for assets and services can be addressed in various markets, providing a competitive advantage when the predictive models used demonstrate high accuracy. However, the training of machine learning models incurs high computational costs, which may limit the training of prediction models based on available computational capacity. In this context, this paper presents an approach for… ▽ More Forecasting demand for assets and services can be addressed in various markets, providing a competitive advantage when the predictive models used demonstrate high accuracy. However, the training of machine learning models incurs high computational costs, which may limit the training of prediction models based on available computational capacity. In this context, this paper presents an approach for training demand prediction models using quantum neural networks. For this purpose, a quantum neural network was used to forecast demand for vehicle financing. A classical recurrent neural network was used to compare the results, and they show a similar predictive capacity between the classical and quantum models, with the advantage of using a lower number of training parameters and also converging in fewer steps. Utilizing quantum computing techniques offers a promising solution to overcome the limitations of traditional machine learning approaches in training predictive models for complex market dynamics. △ Less

Submitted 19 October, 2024; originally announced October 2024.

Comments: 22 pages, 13 figures, 10 tables

arXiv:2410.14038 [pdf, ps, other]

Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Authors: Bryan L. M. de Oliveira, Luana G. B. Martins, Bruno Brandão, Murilo L. da Luz, Telma W. de L. Soares, Luckeciano C. Melo

Abstract: Effective visual representation learning is crucial for reinforcement learning (RL) agents to extract task-relevant information from raw sensory inputs and generalize across diverse environments. However, existing RL benchmarks lack the ability to systematically evaluate representation learning capabilities in isolation from other learning challenges. To address this gap, we introduce the Sliding… ▽ More Effective visual representation learning is crucial for reinforcement learning (RL) agents to extract task-relevant information from raw sensory inputs and generalize across diverse environments. However, existing RL benchmarks lack the ability to systematically evaluate representation learning capabilities in isolation from other learning challenges. To address this gap, we introduce the Sliding Puzzles Gym (SPGym), a novel benchmark that transforms the classic 8-tile puzzle into a visual RL task with images drawn from arbitrarily large datasets. SPGym's key innovation lies in its ability to precisely control representation learning complexity through adjustable grid sizes and image pools, while maintaining fixed environment dynamics, observation, and action spaces. This design enables researchers to isolate and scale the visual representation challenge independently of other learning components. Through extensive experiments with model-free and model-based RL algorithms, we uncover fundamental limitations in current methods' ability to handle visual diversity. As we increase the pool of possible images, all algorithms exhibit in- and out-of-distribution performance degradation, with sophisticated representation learning techniques often underperforming simpler approaches like data augmentation. These findings highlight critical gaps in visual representation learning for RL and establish SPGym as a valuable tool for driving progress in robust, generalizable decision-making systems. △ Less

Submitted 16 August, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

Comments: Accepted at ICML 2025

arXiv:2410.07003 [pdf, other]

Mirror Bridges Between Probability Measures

Authors: Leticia Mattos Da Silva, Silvia Sellán, Francisco Vargas, Justin Solomon

Abstract: Resampling from a target measure whose density is unknown is a fundamental problem in mathematical statistics and machine learning. A setting that dominates the machine learning literature consists of learning a map from an easy-to-sample prior, such as the Gaussian distribution, to a target measure. Under this model, samples from the prior are pushed forward to generate a new sample on the target… ▽ More Resampling from a target measure whose density is unknown is a fundamental problem in mathematical statistics and machine learning. A setting that dominates the machine learning literature consists of learning a map from an easy-to-sample prior, such as the Gaussian distribution, to a target measure. Under this model, samples from the prior are pushed forward to generate a new sample on the target measure, which is often difficult to sample from directly. Of particular interest is the problem of generating a new sample that is proximate to or otherwise conditioned on a given input sample. In this paper, we propose a new model called mirror bridges to solve this problem of conditional resampling. Our key observation is that solving the Schrödinger bridge problem between a distribution and itself provides a natural way to produce new samples from conditional distributions, giving in-distribution variations of an input data point. We demonstrate how to efficiently estimate the solution to this largely overlooked version of the Schrödinger bridge problem, and we prove that under mild conditions, the difference between our estimate and the true Schrödinger bridge can be controlled explicitly. We show that our proposed method leads to significant algorithmic simplifications over existing alternatives, in addition to providing control over in-distribution variation. Empirically, we demonstrate how these benefits can be leveraged to produce proximal samples in a number of application domains. △ Less

Submitted 19 May, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

arXiv:2410.02415 [pdf, other]

Cellular Network Densification: a System-level Analysis with IAB, NCR and RIS

Authors: Gabriel C. M. da Silva, Victor F. Monteiro, Diego A. Sousa, Darlan C. Moreira, Tarcisio F. Maciel, Fco. Rafael M. Lima, Behrooz Makki

Abstract: As the number of user equipments increases in fifth generation (5G) and beyond, it is desired to densify the cellular network with auxiliary nodes assisting the base stations. Examples of these nodes are integrated access and backhaul (IAB) nodes, network-controlled repeaters (NCRs) and reconfigurable intelligent surfaces (RISs). In this context, this work presents a system level overview of these… ▽ More As the number of user equipments increases in fifth generation (5G) and beyond, it is desired to densify the cellular network with auxiliary nodes assisting the base stations. Examples of these nodes are integrated access and backhaul (IAB) nodes, network-controlled repeaters (NCRs) and reconfigurable intelligent surfaces (RISs). In this context, this work presents a system level overview of these three nodes. Moreover, this work evaluates through simulations the impact of network planning aiming at enhancing the performance of a network used to cover an outdoor sport event. We show that, in the considered scenario, in general, IAB nodes provide an improved signal to interference-plus-noise ratio and throughput, compared to NCRs and RISs. However, there are situations where NCR outperforms IAB due to higher level of interference caused by the latter. Finally, we show that the deployment of these nodes in unmanned aerial vehicles (UAVs) also achieves performance gains due to their aerial mobility. However, UAV constraints related to aerial deployment may prevent these nodes from reaching results as good as the ones achieved by their stationary deployment. △ Less

Submitted 3 October, 2024; originally announced October 2024.

Comments: Paper submitted to IEEE Systems Journal

arXiv:2409.19371 [pdf, other]

Efficient Semantic Diffusion Architectures for Model Training on Synthetic Echocardiograms

Authors: David Stojanovski, Mariana da Silva, Pablo Lamata, Arian Beqiri, Alberto Gomez

Abstract: We investigate the utility of diffusion generative models to efficiently synthesise datasets that effectively train deep learning models for image analysis. Specifically, we propose novel $Γ$-distribution Latent Denoising Diffusion Models (LDMs) designed to generate semantically guided synthetic cardiac ultrasound images with improved computational efficiency. We also investigate the potential of… ▽ More We investigate the utility of diffusion generative models to efficiently synthesise datasets that effectively train deep learning models for image analysis. Specifically, we propose novel $Γ$-distribution Latent Denoising Diffusion Models (LDMs) designed to generate semantically guided synthetic cardiac ultrasound images with improved computational efficiency. We also investigate the potential of using these synthetic images as a replacement for real data in training deep networks for left-ventricular segmentation and binary echocardiogram view classification tasks. We compared six diffusion models in terms of the computational cost of generating synthetic 2D echo data, the visual realism of the resulting images, and the performance, on real data, of downstream tasks (segmentation and classification) trained using these synthetic echoes. We compare various diffusion strategies and ODE solvers for their impact on segmentation and classification performance. The results show that our propose architectures significantly reduce computational costs while maintaining or improving downstream task performance compared to state-of-the-art methods. While other diffusion models generated more realistic-looking echo images at higher computational cost, our research suggests that for model training, visual realism is not necessarily related to model performance, and considerable compute costs can be saved by using more efficient models. △ Less

Submitted 28 September, 2024; originally announced September 2024.

arXiv:2409.16218 [pdf, other]

Problem-oriented AutoML in Clustering

Authors: Matheus Camilo da Silva, Gabriel Marques Tavares, Eric Medvet, Sylvio Barbon Junior

Abstract: The Problem-oriented AutoML in Clustering (PoAC) framework introduces a novel, flexible approach to automating clustering tasks by addressing the shortcomings of traditional AutoML solutions. Conventional methods often rely on predefined internal Clustering Validity Indexes (CVIs) and static meta-features, limiting their adaptability and effectiveness across diverse clustering tasks. In contrast,… ▽ More The Problem-oriented AutoML in Clustering (PoAC) framework introduces a novel, flexible approach to automating clustering tasks by addressing the shortcomings of traditional AutoML solutions. Conventional methods often rely on predefined internal Clustering Validity Indexes (CVIs) and static meta-features, limiting their adaptability and effectiveness across diverse clustering tasks. In contrast, PoAC establishes a dynamic connection between the clustering problem, CVIs, and meta-features, allowing users to customize these components based on the specific context and goals of their task. At its core, PoAC employs a surrogate model trained on a large meta-knowledge base of previous clustering datasets and solutions, enabling it to infer the quality of new clustering pipelines and synthesize optimal solutions for unseen datasets. Unlike many AutoML frameworks that are constrained by fixed evaluation metrics and algorithm sets, PoAC is algorithm-agnostic, adapting seamlessly to different clustering problems without requiring additional data or retraining. Experimental results demonstrate that PoAC not only outperforms state-of-the-art frameworks on a variety of datasets but also excels in specific tasks such as data visualization, and highlight its ability to dynamically adjust pipeline configurations based on dataset complexity. △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2407.16624 [pdf, other]

Semantic Change Characterization with LLMs using Rhetorics

Authors: Jader Martins Camboim de Sá, Marcos Da Silveira, Cédric Pruski

Abstract: Languages continually evolve in response to societal events, resulting in new terms and shifts in meanings. These changes have significant implications for computer applications, including automatic translation and chatbots, making it essential to characterize them accurately. The recent development of LLMs has notably advanced natural language understanding, particularly in sense inference and re… ▽ More Languages continually evolve in response to societal events, resulting in new terms and shifts in meanings. These changes have significant implications for computer applications, including automatic translation and chatbots, making it essential to characterize them accurately. The recent development of LLMs has notably advanced natural language understanding, particularly in sense inference and reasoning. In this paper, we investigate the potential of LLMs in characterizing three types of semantic change: dimension, relation, and orientation. We achieve this by combining LLMs' Chain-of-Thought with rhetorical devices and conducting an experimental assessment of our approach using newly created datasets. Our results highlight the effectiveness of LLMs in capturing and analyzing semantic changes, providing valuable insights to improve computational linguistic applications. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.15591 [pdf]

FAIR evaluation of ten widely used chemical datasets: Lessons learned and recommendations

Authors: Marcos Da Silveira, Oona Freudenthal, Louis Deladiennee

Abstract: This document focuses on databases disseminating data on (hazardous) substances found on the North American and the European (EU) market. The goal is to analyse the FAIRness (Findability, Accessibility, Interoperability and Reusability) of published open data on these substances and to qualitatively evaluate to what extend the selected databases already fulfil the criteria set out in the commissio… ▽ More This document focuses on databases disseminating data on (hazardous) substances found on the North American and the European (EU) market. The goal is to analyse the FAIRness (Findability, Accessibility, Interoperability and Reusability) of published open data on these substances and to qualitatively evaluate to what extend the selected databases already fulfil the criteria set out in the commission draft regulation on a common data chemicals platform. We implemented two complementary approaches: Manual, and Automatic. The manual approach is based on online questionnaires. These questionnaires provide a structured approach to evaluating FAIRness by guiding users through a series of questions related to the FAIR principles. They are particularly useful for initiating discussions on FAIR implementation within research teams and for identifying areas that require further attention. Automated tools for FAIRness assessment, such as F-UJI and FAIR Checker, are gaining prominence and are continuously under development. Unlike manual tools, automated tools perform a series of tests automatically starting from a dereferenceable URL to the data resource to be evaluated. We analysed ten widely adopted datasets managed in Europe and North America. The highest score from automatic analysis was 54/100. The manual analysis shows that several FAIR metrics were satisfied, but not detectable by automatic tools because there is no metadata, or the format of the information was not a standard one. Thus, it was not interpretable by the tool. We present the details of the analysis and tables summarizing the outcomes, the issues, and the suggestions to address these issues. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.02876 [pdf, other]

Prävention und Beseitigung von Fehlerursachen im Kontext von unbemannten Fahrzeugen

Authors: Aron Schnakenbeck, Christoph Sieber, Luis Miguel Vieira da Silva, Felix Gehlhoff, Alexander Fay

Abstract: Mobile robots, becoming increasingly autonomous, are capable of operating in diverse and unknown environments. This flexibility allows them to fulfill goals independently and adapting their actions dynamically without rigidly predefined control codes. However, their autonomous behavior complicates guaranteeing safety and reliability due to the limited influence of a human operator to accurately su… ▽ More Mobile robots, becoming increasingly autonomous, are capable of operating in diverse and unknown environments. This flexibility allows them to fulfill goals independently and adapting their actions dynamically without rigidly predefined control codes. However, their autonomous behavior complicates guaranteeing safety and reliability due to the limited influence of a human operator to accurately supervise and verify each robot's actions. To ensure autonomous mobile robot's safety and reliability, which are aspects of dependability, methods are needed both in the planning and execution of missions for autonomous mobile robots. In this article, a twofold approach is presented that ensures fault removal in the context of mission planning and fault prevention during mission execution for autonomous mobile robots. First, the approach consists of a concept based on formal verification applied during the planning phase of missions. Second, the approach consists of a rule-based concept applied during mission execution. A use case applying the approach is presented, discussing how the two concepts complement each other and what contribution they make to certain aspects of dependability. △ Less

Submitted 4 November, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: Language: German. Dieser Beitrag wird eingereicht in: "dtec.bw-Beiträge der Helmut-Schmidt-Universität/Universität der Bundeswehr Hamburg: Forschungsaktivitäten im Zentrum für Digitalisierungs- und Technologieforschung der Bundeswehr dtec.bw"

arXiv:2407.02669 [pdf, other]

Impact of Network Deployment on the Performance of NCR-assisted Networks

Authors: Gabriel C. M. da Silva, Diego A. Sousa, Victor F. Monteiro, Darlan C. Moreira, Tarcisio F. Maciel, Fco. Rafael M. Lima, Behrooz Makki

Abstract: To address the need of coverage enhancement in the fifth generation (5G) of wireless cellular telecommunications, while taking into account possible bottlenecks related to deploying fiber based backhaul (e.g., required cost and time), the 3rd generation partnership project (3GPP) proposed in Release 18 the concept of network-controlled repeaters (NCRs). NCRs enhance previous radio frequency (RF) r… ▽ More To address the need of coverage enhancement in the fifth generation (5G) of wireless cellular telecommunications, while taking into account possible bottlenecks related to deploying fiber based backhaul (e.g., required cost and time), the 3rd generation partnership project (3GPP) proposed in Release 18 the concept of network-controlled repeaters (NCRs). NCRs enhance previous radio frequency (RF) repeaters by exploring beamforming transmissions controlled by the network through side control information. In this context, this paper introduces the concept of NCR. Furthermore, we present a system level model that allows the performance evaluation of an NCR-assisted network. Finally, we evaluate the network deployment impact on the performance of NCR-assisted networks. As we show, with proper network planning, NCRs can boost the signal to interference-plus-noise ratio (SINR) of the user equipments (UEs) in a poor coverage of a macro base station. Furthermore, celledge UEs and uplink (UL) communications are the ones that benefit the most from the presence of NCRs. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Paper accepted for publication in the conference proceedings of "19th International Symposium on Wireless Communication Systems" (ISWCS)

arXiv:2406.17792 [pdf, other]

Applications of interpretable deep learning in neuroimaging: a comprehensive review

Authors: Lindsay Munroe, Mariana da Silva, Faezeh Heidari, Irina Grigorescu, Simon Dahan, Emma C. Robinson, Maria Deprez, Po-Wah So

Abstract: Clinical adoption of deep learning models has been hindered, in part, because the black-box nature of neural networks leads to concerns regarding their trustworthiness and reliability. These concerns are particularly relevant in the field of neuroimaging due to the complex brain phenotypes and inter-subject heterogeneity often encountered. The challenge can be addressed by interpretable deep learn… ▽ More Clinical adoption of deep learning models has been hindered, in part, because the black-box nature of neural networks leads to concerns regarding their trustworthiness and reliability. These concerns are particularly relevant in the field of neuroimaging due to the complex brain phenotypes and inter-subject heterogeneity often encountered. The challenge can be addressed by interpretable deep learning (iDL) methods that enable the visualisation and interpretation of the inner workings of deep learning models. This study systematically reviewed the literature on neuroimaging applications of iDL methods and critically analysed how iDL explanation properties were evaluated. Seventy-five studies were included, and ten categories of iDL methods were identified. We also reviewed five properties of iDL explanations that were analysed in the included studies: biological validity, robustness, continuity, selectivity, and downstream task performance. We found that the most popular iDL approaches used in the literature may be sub-optimal for neuroimaging data, and we discussed possible future directions for the field. △ Less

Submitted 30 May, 2024; originally announced June 2024.

arXiv:2406.13128 [pdf, other]

A New Approach for Evaluating and Improving the Performance of Segmentation Algorithms on Hard-to-Detect Blood Vessels

Authors: João Pedro Parella, Matheus Viana da Silva, Cesar Henrique Comin

Abstract: Many studies regarding the vasculature of biological tissues involve the segmentation of the blood vessels in a sample followed by the creation of a graph structure to model the vasculature. The graph is then used to extract relevant vascular properties. Small segmentation errors can lead to largely distinct connectivity patterns and a high degree of variability of the extracted properties. Nevert… ▽ More Many studies regarding the vasculature of biological tissues involve the segmentation of the blood vessels in a sample followed by the creation of a graph structure to model the vasculature. The graph is then used to extract relevant vascular properties. Small segmentation errors can lead to largely distinct connectivity patterns and a high degree of variability of the extracted properties. Nevertheless, global metrics such as Dice, precision, and recall are commonly applied for measuring the performance of blood vessel segmentation algorithms. These metrics might conceal important information about the accuracy at specific regions of a sample. To tackle this issue, we propose a local vessel salience (LVS) index to quantify the expected difficulty in segmenting specific blood vessel segments. The LVS index is calculated for each vessel pixel by comparing the local intensity of the vessel with the image background around the pixel. The index is then used for defining a new accuracy metric called low-salience recall (LSRecall), which quantifies the performance of segmentation algorithms on blood vessel segments having low salience. The perspective provided by the LVS index is used to define a data augmentation procedure that can be used to improve the segmentation performance of convolutional neural networks. We show that segmentation algorithms having high Dice and recall values can display very low LSRecall values, which reveals systematic errors of these algorithms for vessels having low salience. The proposed data augmentation procedure is able to improve the LSRecall of some samples by as much as 25%. The developed methodology opens up new possibilities for comparing the performance of segmentation algorithms regarding hard-to-detect blood vessels as well as their capabilities for vascular topology preservation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.07962 [pdf, other]

doi 10.1109/ETFA61755.2024.10710783

Toward a Method to Generate Capability Ontologies from Natural Language Descriptions

Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Felix Gehlhoff, Alexander Fay

Abstract: To achieve a flexible and adaptable system, capability ontologies are increasingly leveraged to describe functions in a machine-interpretable way. However, modeling such complex ontological descriptions is still a manual and error-prone task that requires a significant amount of effort and ontology expertise. This contribution presents an innovative method to automate capability ontology modeling… ▽ More To achieve a flexible and adaptable system, capability ontologies are increasingly leveraged to describe functions in a machine-interpretable way. However, modeling such complex ontological descriptions is still a manual and error-prone task that requires a significant amount of effort and ontology expertise. This contribution presents an innovative method to automate capability ontology modeling using Large Language Models (LLMs), which have proven to be well suited for such tasks. Our approach requires only a natural language description of a capability, which is then automatically inserted into a predefined prompt using a few-shot prompting technique. After prompting an LLM, the resulting capability ontology is automatically verified through various steps in a loop with the LLM to check the overall correctness of the capability ontology. First, a syntax check is performed, then a check for contradictions, and finally a check for hallucinations and missing ontology elements. Our method greatly reduces manual effort, as only the initial natural language description and a final human review and possible correction are necessary, thereby streamlining the capability ontology generation process. △ Less

Submitted 18 October, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2405.04841 [pdf, other]

xMTrans: Temporal Attentive Cross-Modality Fusion Transformer for Long-Term Traffic Prediction

Authors: Huy Quang Ung, Hao Niu, Minh-Son Dao, Shinya Wada, Atsunori Minamikawa

Abstract: Traffic predictions play a crucial role in intelligent transportation systems. The rapid development of IoT devices allows us to collect different kinds of data with high correlations to traffic predictions, fostering the development of efficient multi-modal traffic prediction models. Until now, there are few studies focusing on utilizing advantages of multi-modal data for traffic predictions. In… ▽ More Traffic predictions play a crucial role in intelligent transportation systems. The rapid development of IoT devices allows us to collect different kinds of data with high correlations to traffic predictions, fostering the development of efficient multi-modal traffic prediction models. Until now, there are few studies focusing on utilizing advantages of multi-modal data for traffic predictions. In this paper, we introduce a novel temporal attentive cross-modality transformer model for long-term traffic predictions, namely xMTrans, with capability of exploring the temporal correlations between the data of two modalities: one target modality (for prediction, e.g., traffic congestion) and one support modality (e.g., people flow). We conducted extensive experiments to evaluate our proposed model on traffic congestion and taxi demand predictions using real-world datasets. The results showed the superiority of xMTrans against recent state-of-the-art methods on long-term traffic predictions. In addition, we also conducted a comprehensive ablation study to further analyze the effectiveness of each module in xMTrans. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Accepted at MDM 2024

arXiv:2404.17524 [pdf, other]

doi 10.1109/ETFA61755.2024.10710775

On the Use of Large Language Models to Generate Capability Ontologies

Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Felix Gehlhoff, Alexander Fay

Abstract: Capability ontologies are increasingly used to model functionalities of systems or machines. The creation of such ontological models with all properties and constraints of capabilities is very complex and can only be done by ontology experts. However, Large Language Models (LLMs) have shown that they can generate machine-interpretable models from natural language text input and thus support engine… ▽ More Capability ontologies are increasingly used to model functionalities of systems or machines. The creation of such ontological models with all properties and constraints of capabilities is very complex and can only be done by ontology experts. However, Large Language Models (LLMs) have shown that they can generate machine-interpretable models from natural language text input and thus support engineers / ontology experts. Therefore, this paper investigates how LLMs can be used to create capability ontologies. We present a study with a series of experiments in which capabilities with varying complexities are generated using different prompting techniques and with different LLMs. Errors in the generated ontologies are recorded and compared. To analyze the quality of the generated ontologies, a semi-automated approach based on RDF syntax checking, OWL reasoning, and SHACL constraints is used. The results of this study are very promising because even for complex capabilities, the generated ontologies are almost free of errors. △ Less

Submitted 18 October, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Comments: \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2404.06256 [pdf, other]

Label-Efficient 3D Object Detection For Road-Side Units

Authors: Minh-Quan Dao, Holger Caesar, Julie Stephany Berrio, Mao Shan, Stewart Worrall, Vincent Frémont, Ezio Malis

Abstract: Occlusion presents a significant challenge for safety-critical applications such as autonomous driving. Collaborative perception has recently attracted a large research interest thanks to the ability to enhance the perception of autonomous vehicles via deep information fusion with intelligent roadside units (RSU), thus minimizing the impact of occlusion. While significant advancement has been made… ▽ More Occlusion presents a significant challenge for safety-critical applications such as autonomous driving. Collaborative perception has recently attracted a large research interest thanks to the ability to enhance the perception of autonomous vehicles via deep information fusion with intelligent roadside units (RSU), thus minimizing the impact of occlusion. While significant advancement has been made, the data-hungry nature of these methods creates a major hurdle for their real-world deployment, particularly due to the need for annotated RSU data. Manually annotating the vast amount of RSU data required for training is prohibitively expensive, given the sheer number of intersections and the effort involved in annotating point clouds. We address this challenge by devising a label-efficient object detection method for RSU based on unsupervised object discovery. Our paper introduces two new modules: one for object discovery based on a spatial-temporal aggregation of point clouds, and another for refinement. Furthermore, we demonstrate that fine-tuning on a small portion of annotated data allows our object discovery models to narrow the performance gap with, or even surpass, fully supervised models. Extensive experiments are carried out in simulated and real-world datasets to evaluate our method. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: IV 2024

arXiv:2404.03061 [pdf, other]

WebSPL: A Software Product Line for Web Applications

Authors: Maicon Azevedo da Luz, Kleinner Farias

Abstract: Companies developing Web applications have faced an increasing demand for high-quality products with low cost and production time ever smaller. However, developing such applications is still considered a time-consuming and error-prone task, mainly due to the difficulty of promoting the reuse of features (or functionalities) and modules, and the heterogeneity of Web frameworks. Nowadays, companies… ▽ More Companies developing Web applications have faced an increasing demand for high-quality products with low cost and production time ever smaller. However, developing such applications is still considered a time-consuming and error-prone task, mainly due to the difficulty of promoting the reuse of features (or functionalities) and modules, and the heterogeneity of Web frameworks. Nowadays, companies must face ever-changing requirements. Software product lines emerged as an alternative to face this challenge by creating a collection of applications from a core of software assets. Despite the potential, the current literature lacks works that propose a product line for Web applications. This paper, therefore, presents WebSPL, a product line for Web applications that supports the main features found in Wed applications in real-world settings. The proposed WebSPL was evaluated by comparing it with a Web application developed based on a traditional approach. A case study that involves the development of two Web applications enabled data collection. Two Web applications were developed -- one with and another without the support of the proposed WebSPL. We compared these two applications using software design metrics, including complexity, size, duplicate lines, and technical debt. The initial results were encouraging and showed the potential for using WebSPL to support the development of Web applications. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 6 figures, 3 tables

arXiv:2402.19088 [pdf, other]

Survey in Characterization of Semantic Change

Authors: Jader Martins Camboim de Sá, Marcos Da Silveira, Cédric Pruski

Abstract: Live languages continuously evolve to integrate the cultural change of human societies. This evolution manifests through neologisms (new words) or \textbf{semantic changes} of words (new meaning to existing words). Understanding the meaning of words is vital for interpreting texts coming from different cultures (regionalism or slang), domains (e.g., technical terms), or periods. In computer scienc… ▽ More Live languages continuously evolve to integrate the cultural change of human societies. This evolution manifests through neologisms (new words) or \textbf{semantic changes} of words (new meaning to existing words). Understanding the meaning of words is vital for interpreting texts coming from different cultures (regionalism or slang), domains (e.g., technical terms), or periods. In computer science, these words are relevant to computational linguistics algorithms such as translation, information retrieval, question answering, etc. Semantic changes can potentially impact the quality of the outcomes of these algorithms. Therefore, it is important to understand and characterize these changes formally. The study of this impact is a recent problem that has attracted the attention of the computational linguistics community. Several approaches propose methods to detect semantic changes with good precision, but more effort is needed to characterize how the meaning of words changes and to reason about how to reduce the impact of semantic change. This survey provides an understandable overview of existing approaches to the \textit{characterization of semantic changes} and also formally defines three classes of characterizations: if the meaning of a word becomes more general or narrow (change in dimension) if the word is used in a more pejorative or positive/ameliorated sense (change in orientation), and if there is a trend to use the word in a, for instance, metaphoric or metonymic context (change in relation). We summarized the main aspects of the selected publications in a table and discussed the needs and trends in the research activities on semantic change characterization. △ Less

Submitted 18 July, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.15470 [pdf, other]

Some results involving the $A_α$-eigenvalues for graphs and line graphs

Authors: Joao Domingos Gomes da Silva Junior, Carla Silva Oliveira, Liliana Manuela Gaspar C. da Costa

Abstract: Let $G$ be a simple graph with adjacency matrix $A(G)$, signless Laplacian matrix $Q(G)$, degree diagonal matrix $D(G)$ and let $l(G)$ be the line graph of $G$. In 2017, Nikiforov defined the $A_α$-matrix of $G$, $A_α(G)$, as a linear convex combination of $A(G)$ and $D(G)$, the following way, $A_α(G):=αA(G)+(1-α)D(G),$ where $α\in[0,1]$. In this paper, we present some bounds for the eigenvalues o… ▽ More Let $G$ be a simple graph with adjacency matrix $A(G)$, signless Laplacian matrix $Q(G)$, degree diagonal matrix $D(G)$ and let $l(G)$ be the line graph of $G$. In 2017, Nikiforov defined the $A_α$-matrix of $G$, $A_α(G)$, as a linear convex combination of $A(G)$ and $D(G)$, the following way, $A_α(G):=αA(G)+(1-α)D(G),$ where $α\in[0,1]$. In this paper, we present some bounds for the eigenvalues of $A_α(G)$ and for the largest and smallest eigenvalues of $A_α(l(G))$. Extremal graphs attaining some of these bounds are characterized. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: 18 pages, 5 figures, 3 tables

MSC Class: 05C05

arXiv:2402.11775 [pdf, other]

FOD-Swin-Net: angular super resolution of fiber orientation distribution using a transformer-based deep model

Authors: Mateus Oliveira da Silva, Caio Pinheiro Santana, Diedre Santos do Carmo, Letícia Rittner

Abstract: Identifying and characterizing brain fiber bundles can help to understand many diseases and conditions. An important step in this process is the estimation of fiber orientations using Diffusion-Weighted Magnetic Resonance Imaging (DW-MRI). However, obtaining robust orientation estimates demands high-resolution data, leading to lengthy acquisitions that are not always clinically available. In this… ▽ More Identifying and characterizing brain fiber bundles can help to understand many diseases and conditions. An important step in this process is the estimation of fiber orientations using Diffusion-Weighted Magnetic Resonance Imaging (DW-MRI). However, obtaining robust orientation estimates demands high-resolution data, leading to lengthy acquisitions that are not always clinically available. In this work, we explore the use of automated angular super resolution from faster acquisitions to overcome this challenge. Using the publicly available Human Connectome Project (HCP) DW-MRI data, we trained a transformer-based deep learning architecture to achieve angular super resolution in fiber orientation distribution (FOD). Our patch-based methodology, FOD-Swin-Net, is able to bring a single-shell reconstruction driven from 32 directions to be comparable to a multi-shell 288 direction FOD reconstruction, greatly reducing the number of required directions on initial acquisition. Evaluations of the reconstructed FOD with Angular Correlation Coefficient and qualitative visualizations reveal superior performance than the state-of-the-art in HCP testing data. Open source code for reproducibility is available at https://github.com/MICLab-Unicamp/FOD-Swin-Net. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: Accepted for publication at ISBI 2024

Showing 1–50 of 189 results for author: Dao, M